[Pharo-dev] Character>>#leadingChar

Henrik Johansen henrik.s.johansen at veloxit.no
Mon Oct 21 09:14:05 EDT 2013


On Oct 21, 2013, at 2:13 , Igor Stasenko <siguctua at gmail.com> wrote:

> 
> 
> 
> On 21 October 2013 13:41, Henrik Johansen <henrik.s.johansen at veloxit.no> wrote:
> 
> On Oct 21, 2013, at 11:56 , Henrik Johansen <henrik.s.johansen at veloxit.no> wrote:
> 
>> My guess is most if not all systems which do support that case, implement a higher level abstraction than "String" to take care of it.
> 
> … or not :)
> http://www.unicode.org/faq/vs.html
> http://www.unicode.org/reports/tr37/
> 
> Which means, you can solve the problems caused by Han Unification using standard Unicode.
> Seems like a lot of fun to implement support for :)
> 
> 
> Why we should care?
> 
> We define a string as a sequence of Characters.
> We say that every Character can be uniquely identified by its unicode value.
> 
> and we say nothing about things like locale, language etc.. because it is higher level concepts, e.g.
> things like mapping unicode value (or sequence of them) into sequence of glyphs to display on screen, using whatever font, is outside of scope of 'String' definition.
>  
> Cheers,
> Henry

The proposition was that leadingChar might be valuable on the grounds that Unicode doesn't allow you to differentiate Korean/Japanese characters in the same document.
Variation sequences show Unicode has acquired a built-in mechanism for doing just that, so the proposition is false.
The work that would be involved in implementing support for them in paths from User input -> String instance and String -> Glyph display is outside the definition of a String, sure, but work nonetheless.
Maybe I should've put quotes around "fun". :P 

One would probably also need to deal with them if implementing Unicode functionality that arguably *is* within String scope btw, such as equality, collation and normalization.
Just because Strings currently treat code point = character, doesn't make it 100% correct :)

Cheers,
Henry
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131021/2f33740e/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20131021/2f33740e/attachment.asc>


More information about the Pharo-dev mailing list