[Pharo-dev] help about codeImporter

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Fri Dec 6 09:15:57 EST 2013


But MC should work better now that sources are UTF8 encoded (for a few
months).

The problem with old squeak/pharo/MC is that encoding did switch for
iso-8859L1 (latin1) to UTF32 if ever a wide character was encountered...
But this wasn't done properly with the ugly text converters, basicNextPut:
et all, the generated stuff was indeed UTF32, but only N bytes would be
written instead of N characters !!! That means that you only stored (an can
retrieve) first 1/4 of source...
But you can have more luck, because the ugglyness did not stop there: it's
possible that first buffers (4096 bytes) were already sent in latin1
encoding, and the next ones in UTF32 (with size bug). In which cas you can
retrieve a bit more of your sources.
I have a prototype to decode such messy sources, but did not publish it,
since you can't recover the whole code anyway.

If ever you have problem with recent MC and improper UTF8 please, please
report.



2013/12/6 Stephan Eggermont <stephan at stack.nl>

> Ben wrote:
> >who put a ô in the code at the first place ? :P
>
> Doesn’t happen often, I’m happy to observe. Strings in code
> with interesting characters are a much more common problem,
> though. Made it impossible to import MCs into Gemstone.
>
> Stephan
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131206/230398cb/attachment-0002.html>


More information about the Pharo-dev mailing list