[Pharo-dev] Character>>#leadingChar

Henrik Johansen henrik.s.johansen at veloxit.no
Mon Oct 21 05:18:29 EDT 2013

On Oct 18, 2013, at 6:34 , Sven Van Caekenberghe <sven at stfx.eu> wrote:

> Hi,
> So once again we have an issue with Character>>#leadingChar, see
>  https://pharo.fogbugz.com/f/cases/6368
> Do we really need this ?
> Any Japanese, Chinese or Korean users willing to comment ?
> Thx,
> Sven

I'm not any of those, but my short answer would be no.

As for the long answer:
LeadingChar has too many responsibilities:
- Character set of string
- Font selection (see StrikeFontSet)
- Han unification disambiguation (through the above font selection)

The conflation of these, and confusion of which leadingChar actually implies, easily leads to bugs, and has done so already. (see Character >> asUnicode as opposed to JapaneseEnvironment >> fromJISX0208String: for example).
I would bet 100€ StrikeFontSet no longer works as intended either, that is, being able to display > latin1 glyphs using StrikeFonts. 

Now, here's why I feel those areas are not worth keeping, especially in their current, buggy state:
- Non-unicode character sets
The main reasons for supporting this would be
1) Size reduction. All Widestrings are 32bits per character, so that's moot.
2) No need for converting codepoints when using Fonts stored with JISX0208 etc. codePoints . I've yet to see a free/truetype font using anything but Unicode, and since we'd be the creators of any theoretical StrikeFontSet covering other languages, we'd be able to avoid it anyways.

If, in the future, it'd be desirable to support encodings other than Unicode for internal strings, I feel separate subclasses are a cleaner solution.

- Font selection / Han unification disambiguation
IMHO, obsoleted by the use of standard TrueType fonts. As long as one does not use StrikeFontSets to display a string, it currently has no benefits.
Yes, one could potentially select different FreeTypeFonts based on it when a run is encountered as well, but the fonts themselves do not contain metadata pertaining to which variant of the glyphs they include, afaik (if they even support them; automatic fallback to another font when current font doesn't cover a  glyph would be a higher area of priority)
Even in that case, it could be a property of the current locale instead, while it means you can't display both korean/japanese text in the same image correctly, it'd be a (imho) acceptable tradeoff.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131021/c280e0a8/attachment.asc>

More information about the Pharo-dev mailing list