[Pharo-dev] Better management of encoding of environment variables
guillermopolito at gmail.com
Fri Jan 18 08:23:38 EST 2019
On Fri, Jan 18, 2019 at 1:48 PM Ben Coman via Pharo-dev <
pharo-dev at lists.pharo.org> wrote:
> On Wed, 16 Jan 2019 at 18:37, Sven Van Caekenberghe <sven at stfx.eu> wrote:
>> Still, one of the conclusions of previous discussions about the encoding
>> of environment variables was/is that there is no single correct solution.
>> OS's are not consistent in how the encoding is done in all (historical)
>> contexts (like sometimes,
>> 1 env var defines the encoding to use for others,
> ouch. That one point nearly made my retract my comment next paragraph,
> but is there much more complexity?
> or just a case of utf8<==>appSpecificEncoding rather than
> ascii<==>appSpecificEncoding ?
It's not muuuuch more complex. The problem is that usually the bugs that
arise from wrongly managing such conversions can be super obscure.
> Sorry if I'm rehashing past discussion (do you have a link?), but
> * 92% of web pages are UTF8 encoded such that pragmatically UTF8 *is*
> the standard for text
> * Strings so pervasive in a system
> ...would there be an overall benefit to adopt UTF8 as the encoding for
> consistently provided across the cross-platform vm interface?
> (i.e. fixing platforms that don't comply to the standard due to their
> historical baggage)
> And I found it interesting Microsoft are making some moves towards UTF8
> "With insider build 17035 and the April 2018 update (nominal build 17134)
> for Windows 10, a "Beta: Use Unicode UTF-8 for worldwide language support"
> checkbox appeared for setting the locale code page to UTF-8.[a] This allows
> for calling "narrow" functions, including fopen and SetWindowTextA, with
> UTF-8 strings. "
> The approach vm-side could be similar to Section 10 How to do text on
> Windows 
> with the philosophy of "performing the [conversions] as close to API calls
> as possible,
> and never holding the [converted] data."
>  https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
>  http://utf8everywhere.org/
> different applications do different things, and other such nice stuff),
>> and certainly not across platforms.
>> So this is really complex.
>> Do we want to hide this in some obscure VM C code that very few people
>> can see, read, let alone help with ?
>> The image side is perfectly capable of dealing with platform differences
>> in a clean/clear way, and at least we can then use the full power of our
>> language and our tools.
> Big question... Do we currently have primitives of the same name returning
> different encodings on different platforms? I presume that would be
> If the image is handle encoding differences, should separate primitives be
> used? e.g. utf8GetEnv & utf16getEnv
> Could I get some feedback on  saying... **The Single Most Important
> Fact About Encodings**
> If you completely forget everything I just explained, please remember one
> extremely important fact.
> It does not make sense to have a string without knowing what encoding it
> uses. "
> And so... does our String nowadays require an 'encoding' instance variable
> such that this is *always* associated?
> This might remove any need for separate utf8GetEnv & utf16getEnv (if that
> was even a reasonable idea).
I think that will just overcomplicate things. Right now, all Strings in
Pharo are unicode strings. Characters are represented with their
corresponding unicode codepoint.
If all characters in a string have codepoints < 256 then they are just
stored in a bytestring. Otherwise they are WideStrings.
I think assuming a single representation for strings, and then encode when
interacting with external apps/APIs is MUCH simpler.
> cheers -ben
>> > On 16 Jan 2019, at 10:59, Guillermo Polito <guillermopolito at gmail.com>
>> > Hi Nicolas,
>> > On Wed, Jan 16, 2019 at 10:25 AM Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com> wrote:
>> > IMO, windows VM (and plugins) should do the UCS2 -> UTF8 conversion
>> because the purpose of a VM is to provide an OS independant façade.
>> > I made progress recently in this area, but we should finish the
>> > I'm following your changes for windows from the shadows and I think
>> they are awesome :).
>> > If someone bypass the VM and use direct windows API thru FFI, then he
>> takes the responsibility, but uniformity doesn't hurt.
>> > So far we are using FFI for this, as you say we create first
>> Win32WideStrings from utf8 strings and then we use ffi calls to the *W
>> > I don't think we can make it for Pharo7.0.0. The cycle to build, do
>> some acceptance tests, and then bless a new VM as stable is far too long
>> for our inminent release :).
>> > But this could be for a 7.1.0, and if you like I can surely give a hand
>> on this.
>> > Guille
Centre de Recherche en Informatique, Signal et Automatique de Lille
CRIStAL - UMR 9189
French National Center for Scientific Research - *http://www.cnrs.fr
*Web:* *http://guillep.github.io* <http://guillep.github.io>
*Phone: *+33 06 52 70 66 13
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Pharo-dev