[Pharo-dev] Better management of encoding of environment variables
Sven Van Caekenberghe
sven at stfx.eu
Fri Jan 18 09:05:33 EST 2019
> On 18 Jan 2019, at 14:45, Ben Coman <btc at openinworld.com> wrote:
> On Fri, 18 Jan 2019 at 21:39, Sven Van Caekenberghe <sven at stfx.eu> wrote:
> > On 18 Jan 2019, at 14:23, Guillermo Polito <guillermopolito at gmail.com> wrote:
> > I think that will just overcomplicate things. Right now, all Strings in Pharo are unicode strings.
> Cool. I didn't realise that. But to be pedantic, which unicode encoding?
> Should I presume from Sven's "UTF-8 encoding step" comment below
> and the WideString class comment "This class represents the array of 32 bit wide characters"
> that the WideString encoding is UTF-32? So should its comment be updated to advise that?
Not really, Pharo Strings are a collection of Characters, each of which is a Unicode code point (yes a 32 bit one).
An encoding projects this rather abstract notion onto a sequence of bytes,
UTF-32 (ZnUTF32Encoder, https://en.wikipedia.org/wiki/UTF-32) is for example endian dependent.
Read the first part of
> cheers -ben
> Characters are represented with their corresponding unicode codepoint.
> > If all characters in a string have codepoints < 256 then they are just stored in a bytestring. Otherwise they are WideStrings.
> > I think assuming a single representation for strings, and then encode when interacting with external apps/APIs is MUCH simpler.
> Absolutely !
> (and yes I know that for outgoing FFI calls that might mean a UTF-8 encoding step, so be it).
More information about the Pharo-dev