[Pharo-dev] Better management of encoding of environment variables
David T. Lewis
lewis at mail.msen.com
Sun Jan 27 09:30:22 EST 2019
On Fri, Jan 18, 2019 at 08:58:07AM -0500, David T. Lewis wrote:
> On Fri, Jan 18, 2019 at 01:40:26PM +0100, Sven Van Caekenberghe wrote:
> > > On 18 Jan 2019, at 01:54, David T. Lewis via Pharo-dev <pharo-dev at lists.pharo.org> wrote:
> > >
> > > On Thu, Jan 17, 2019 at 04:57:18PM +0100, Sven Van Caekenberghe wrote:
> > >>
> > >> Right, bytes are always uninterpreted, else they would be something else.
> > >> We got ByteArray>>#decodedWith: and ByteArray>>#utf8Decoded and our ByteArray
> > >> inspector decodes automatically if it can.
> > >
> > > Hi Sven,
> > >
> > > I am the author of the getenv primitives, and I am also sadly uninformed
> > > about matters of character sets and strings in a multilingual environment.
> > >
> > > The primitives answer environment variable variable values as ByteString
> > > rather than ByteArray. This made sense to me at the time that I wrote it,
> > > because ByteString is easy to display in an inspector, and because it is
> > > easily converted to ByteArray.
> > >
> > > For an American English speaker this seems like a good choice, but I
> > > wonder now if it is a bad decision. After all, it is also trivially easy
> > > to convert a ByteArray to ByteString for display in the image.
> > >
> > > Would it be helpful to have getenv primitives that answer ByteArray
> > > instead, and to let all conversion (including in OSProcess) be done in
> > > the image?
> > >
> > > Thanks,
> > > Dave
> > Normally, the correct way to represent uninterpreted bytes is with a
> > ByteArray. Decoding these bytes as characters is the specific task of
> > a character encoder/decoder, with a deliberate choice as to which to use.
> > Since the getenv() system call uses simple C strings, it is understandable
> > that this was carried over. It is probably not worth or too risky to
> > change that - as long as the receiver understands that it is a raw OS
> > string that needs more work.
> > Like with file path encoding/decoding, environment variable encoding/decoding
> > is plain messy and complex. IMHO it is better to manage that at the
> > image level where we are more agile and can better handle that complexity.
> Thanks Sven, that makes perfect sense to me.
I added some new primitives to OSProcessPlugin that answer ByteArray instead of ByteString.
For Unix (Linux, OS X):
<primitive: 'primitiveGetCurrentWorkingDirectoryAsBytes' module: 'UnixOSProcessPlugin'>
<primitive: 'primitiveArgumentAtAsBytes' module: 'UnixOSProcessPlugin'>
<primitive: 'primitiveEnvironmentAtAsBytes' module: 'UnixOSProcessPlugin'>
<primitive: 'primitiveEnvironmentAtSymbolAsBytes' module: 'UnixOSProcessPlugin'>
<primitive: 'primitiveRealpathAsBytes' module: 'UnixOSProcessPlugin'>
<primitive: 'primitiveGetCurrentWorkingDirectoryAsBytes' module: 'Win32OSProcessPlugin'>
<primitive: 'primitiveGetEnvironmentStringsAsBytes' module: 'Win32OSProcessPlugin'>
These should be in the latest VM builds now.
If you are using OSProcess, update it to the latest version to get accessor methods
for the new primitives. For example, OSProcess accessor primGetCurrentWorkingDirectory
calls the original primitive that answers a ByteString, and to get raw bytes
you can use OSProcess accessor primGetCurrentWorkingDirectoryAsBytes instead.
More information about the Pharo-dev