[Pharo-dev] Better management of encoding of environment variables

Sven Van Caekenberghe sven at stfx.eu
Fri Jan 18 07:40:26 EST 2019


> On 18 Jan 2019, at 01:54, David T. Lewis via Pharo-dev <pharo-dev at lists.pharo.org> wrote:
> From: "David T. Lewis" <lewis at mail.msen.com>
> Subject: Re: [Pharo-dev] Better management of encoding of environment variables
> Date: 18 January 2019 at 01:54:34 GMT+1
> To: Pharo Development List <pharo-dev at lists.pharo.org>
> On Thu, Jan 17, 2019 at 04:57:18PM +0100, Sven Van Caekenberghe wrote:
>>> On 16 Jan 2019, at 23:23, Eliot Miranda <eliot.miranda at gmail.com> wrote:
>>> On Wed, Jan 16, 2019 at 2:37 AM Sven Van Caekenberghe <sven at stfx.eu> wrote:
>>> The image side is perfectly capable of dealing with platform differences
>>> in a clean/clear way, and at least we can then use the full power of our
>>> language and our tools.
>> Agreed.  At the same time I think it is very important that we don't reply
>> on the FFI for environment variable access.  This is a basic cross-platform
>> facility.  So I would like to see the environment accessed through primitives,
>> but have the image place interpretation on the result of the primitive(s),
>> and have the primitive(s) answer a raw result, just a sequence of uninterpreted
>> bytes.
>> OK, I can understand that ENV VAR access is more fundamental than FFI
>> (although FFI is already essential for Pharo, also during startup).
>>> VisualWorks takes this approach and provides a class UninterpretedBytes
>>> that the VM is aware of.  That's always seemed like an ugly name and
>>> overkill to me.  I would just use ByteArray and provide image level
>>> conversion from ByteArray to String, which is what I believe we have anyway.
>> Right, bytes are always uninterpreted, else they would be something else.
>> We got ByteArray>>#decodedWith: and ByteArray>>#utf8Decoded and our ByteArray
>> inspector decodes automatically if it can.
> Hi Sven,
> I am the author of the getenv primitives, and I am also sadly uninformed
> about matters of character sets and strings in a multilingual environment.
> The primitives answer environment variable variable values as ByteString
> rather than ByteArray. This made sense to me at the time that I wrote it,
> because ByteString is easy to display in an inspector, and because it is
> easily converted to ByteArray.
> For an American English speaker this seems like a good choice, but I
> wonder now if it is a bad decision. After all, it is also trivially easy
> to convert a ByteArray to ByteString for display in the image.
> Would it be helpful to have getenv primitives that answer ByteArray
> instead, and to let all conversion (including in OSProcess) be done in
> the image?
> Thanks,
> Dave

Normally, the correct way to represent uninterpreted bytes is with a ByteArray. Decoding these bytes as characters is the specific task of a character encoder/decoder, with a deliberate choice as to which to use.

Since the getenv() system call uses simple C strings, it is understandable that this was carried over. It is probably not worth or too risky to change that - as long as the receiver understands that it is a raw OS string that needs more work.

Like with file path encoding/decoding, environment variable encoding/decoding is plain messy and complex. IMHO it is better to manage that at the image level where we are more agile and can better handle that complexity.


BTW: using funny Unicode chars, like 🎈 [https://www.fileformat.info/info/unicode/char/1f388/index.htm] is something even English speakers do.

More information about the Pharo-dev mailing list