[Pharo-dev] Better management of encoding of environment variables

Guillermo Polito guillermopolito at gmail.com
Fri Jan 18 05:04:35 EST 2019


On Fri, Jan 18, 2019 at 1:58 AM David T. Lewis via Pharo-dev <
pharo-dev at lists.pharo.org> wrote:

> On Thu, Jan 17, 2019 at 04:57:18PM +0100, Sven Van Caekenberghe wrote:
> >
> > > On 16 Jan 2019, at 23:23, Eliot Miranda <eliot.miranda at gmail.com>
> wrote:
> > >
> > > On Wed, Jan 16, 2019 at 2:37 AM Sven Van Caekenberghe <sven at stfx.eu>
> wrote:
> > >
> > > The image side is perfectly capable of dealing with platform
> differences
> > > in a clean/clear way, and at least we can then use the full power of
> our
> > > language and our tools.
> > >
> > Agreed.


+1

At the same time I think it is very important that we don't reply
> > on the FFI for environment variable access.  This is a basic
> cross-platform
> > facility.  So I would like to see the environment accessed through
> primitives,
> > but have the image place interpretation on the result of the
> primitive(s),
> > and have the primitive(s) answer a raw result, just a sequence of
> uninterpreted
> >  bytes.
>

Having looked at it not so long ago, I'll add my 2cts.

Environment access is a very particular scenario.
We have in Pharo many startup actions that directly or indirectly
(FileLocator home?) require environment variable access, and thus we have
to be really careful and picky to make sure that they all work,
dependencies are installed in the right order and so on...

In Pharo6 this was specially difficult because FFI was dynamically
compiling methods,
  => which required access to argument names,
     => which required access to the sources files,
       => which required access to the env vars (because in Pharo the
source/changes files are looked up in other directories than the image/vm
ones)
          => which loops :)

In Pharo7 argument names in FFI calls are embedded in the method meta-data
so all that is avoided.

Still I'd agree that moving this support to a primitive would make it less
fragile.
I'd apply the same to getting/setting the working directory.

>
> > OK, I can understand that ENV VAR access is more fundamental than FFI
> > (although FFI is already essential for Pharo, also during startup).
> >
> > > VisualWorks takes this approach and provides a class UninterpretedBytes
> > > that the VM is aware of.  That's always seemed like an ugly name and
> > > overkill to me.  I would just use ByteArray and provide image level
> > > conversion from ByteArray to String, which is what I believe we have
> anyway.
> >
> > Right, bytes are always uninterpreted, else they would be something else.
> > We got ByteArray>>#decodedWith: and ByteArray>>#utf8Decoded and our
> ByteArray
> >  inspector decodes automatically if it can.
> >
>
> Hi Sven,
>
> I am the author of the getenv primitives, and I am also sadly uninformed
> about matters of character sets and strings in a multilingual environment.
>
> The primitives answer environment variable variable values as ByteString
> rather than ByteArray. This made sense to me at the time that I wrote it,
> because ByteString is easy to display in an inspector, and because it is
> easily converted to ByteArray.
>
> For an American English speaker this seems like a good choice, but I
> wonder now if it is a bad decision.


Well, as soon as you want to manage some internationalisation, indeed it is.
But also it is a source of bugs, because assuming ascii is not right for
english either.
Most platforms will assume utf8 by default, and it's not quite the same for
many symbols :).

For example,

Character allByteCharacters size. => 256
Character allByteCharacters utf8Encoded size. 384

Character allByteCharacters select: [ :c |
c asString utf8Encoded size > 1 ].

'€ ‚ƒ„…†‡ˆ‰Š‹Œ Ž ‘’“”•–—˜™š›œ žŸ
¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'

Of course, many of those characters may not be used in the day-to-day of
many people, but as soon as we find one of those (I'm thinking about the
not super strange case of a database storing names :)).
Also think about the poor windows users (like myself since 2 weeks ago),
that have to think about utf16!

BTW, I hope I'm not breaking anybody's mail client by pasting strange
characters here :D (and if so, you may want suggest them to review how they
manage encoding :))

After all, it is also trivially easy
> to convert a ByteArray to ByteString for display in the image.
>

Yes, but it's sometimes difficult to find such places, as there are many
primitives spread in a lot of places doing the wrong thing, which is a
source of bugs...
I'd like to fix it from the root, the question is how to do it without
breaking ^^.
In Pharo we are doing at many places,

self primitiveXXX asByteArray utf8Decoded

So making the primitives return ByteArray instances instead of ByteString
should be safe enough :).
But this is in my opinion clearly a hack instead of fixing the real
problem, and we have to be careful to guard such patterns with comments
everywhere explaining why the bytearray conversion is really needed there...



> Would it be helpful to have getenv primitives that answer ByteArray
> instead, and to let all conversion (including in OSProcess) be done in
> the image?
>

Well, personally I would like that getenv/setenv and getcwd setcwd support
are not in a plugin but as a basic service provided by the vm.

Cheers,
Guille


>
> Thanks,
> Dave
>
>
>

-- 



Guille Polito

Research Engineer

Centre de Recherche en Informatique, Signal et Automatique de Lille

CRIStAL - UMR 9189

French National Center for Scientific Research - *http://www.cnrs.fr
<http://www.cnrs.fr>*


*Web:* *http://guillep.github.io* <http://guillep.github.io>

*Phone: *+33 06 52 70 66 13
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20190118/cdb2adab/attachment.html>


More information about the Pharo-dev mailing list