[Pharo-dev] Character>>#leadingChar

Henrik Johansen henrik.s.johansen at veloxit.no
Mon Oct 21 05:37:18 EDT 2013


As an added bonus, asInteger / asUnicode / codePoint / charCode / asciiValue would all share the same definition; ^value :)

Cheers,
Henry

P.S. codePoint is currently bugged, it should be ^self asUnicode
I'd hardly say the leadingChar-tagged value in potentially different character sets it currently returns meets the ANSI definition of: 
"Return the encoding value of the receiver in the implementation defined execution character set."


On Oct 21, 2013, at 11:18 , Henrik Johansen <henrik.s.johansen at veloxit.no> wrote:

> 
> On Oct 18, 2013, at 6:34 , Sven Van Caekenberghe <sven at stfx.eu> wrote:
> 
>> Hi,
>> 
>> So once again we have an issue with Character>>#leadingChar, see
>> 
>> https://pharo.fogbugz.com/f/cases/6368
>> 
>> Do we really need this ?
>> Any Japanese, Chinese or Korean users willing to comment ?
>> 
>> Thx,
>> 
>> Sven
>> 
> 
> I'm not any of those, but my short answer would be no.
> 
> As for the long answer:
> LeadingChar has too many responsibilities:
> - Character set of string
> - Font selection (see StrikeFontSet)
> - Han unification disambiguation (through the above font selection)
> 
> The conflation of these, and confusion of which leadingChar actually implies, easily leads to bugs, and has done so already. (see Character >> asUnicode as opposed to JapaneseEnvironment >> fromJISX0208String: for example).
> I would bet 100€ StrikeFontSet no longer works as intended either, that is, being able to display > latin1 glyphs using StrikeFonts. 
> 
> Now, here's why I feel those areas are not worth keeping, especially in their current, buggy state:
> - Non-unicode character sets
> The main reasons for supporting this would be
> 1) Size reduction. All Widestrings are 32bits per character, so that's moot.
> 2) No need for converting codepoints when using Fonts stored with JISX0208 etc. codePoints . I've yet to see a free/truetype font using anything but Unicode, and since we'd be the creators of any theoretical StrikeFontSet covering other languages, we'd be able to avoid it anyways.
> 
> If, in the future, it'd be desirable to support encodings other than Unicode for internal strings, I feel separate subclasses are a cleaner solution.
> 
> - Font selection / Han unification disambiguation
> IMHO, obsoleted by the use of standard TrueType fonts. As long as one does not use StrikeFontSets to display a string, it currently has no benefits.
> Yes, one could potentially select different FreeTypeFonts based on it when a run is encountered as well, but the fonts themselves do not contain metadata pertaining to which variant of the glyphs they include, afaik (if they even support them; automatic fallback to another font when current font doesn't cover a  glyph would be a higher area of priority)
> Even in that case, it could be a property of the current locale instead, while it means you can't display both korean/japanese text in the same image correctly, it'd be a (imho) acceptable tradeoff.
> 
> Cheers,
> Henry
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131021/3008210a/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131021/3008210a/attachment.asc>


More information about the Pharo-dev mailing list