[Pharo-users] String>>match issue

btc at openinworld.com btc at openinworld.com
Tue Mar 11 12:01:44 EDT 2014


vmusulainen wrote:
> Hi!
>
> >From comment to #match: method
>
> match: text
> 	"Answer whether text matches the pattern in this string.
> 	Matching ignores upper/lower case differences.
>
> Check it now:
> 1. 'V' match: 'v'  -> true "Ok, It's fine"
> 2. 'Ш' match: 'ш' -> false "Use non-English (Cyrillic) letters Ups-s"
>
> -regards
> Vladimir Musulainen
>
>
>
> --
> View this message in context: http://forum.world.st/String-match-issue-tp4748497.html
> Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.
>
>
>   

If you debug ('Ш' match: 'ш') and trace through to 
WideString>>findSubstring:in:startingAt:matchTable:
you will find that (c1 asciiValue) --> 1096
but (matchTable size) --> 256
so the comparison value defaults to (c1 asciiValue + 1) since the 
character you are comparing is not in the matchTable.
(c2 asciiValue) --> 1064

So for proof of concept change this...
String>>initialize
    CaseInsensitiveOrder := (Array new: 2000) fillFrom: AsciiOrder with: 
#value.   "<--MODIFIED"
    ($a to: $z) do:
        [:c | CaseInsensitiveOrder at: c asciiValue + 1
                put: (CaseInsensitiveOrder at: c asUppercase asciiValue 
+1)].
    CaseInsensitiveOrder at: 1096+1 put:1096.    "<--ADDED"
    CaseInsensitiveOrder at: 1064+1 put:1096.   "<--ADDED"

then in Workspace evaluate "String initialize"
and now ('Ш' match: 'ш') --> true.

Now I'm not sure the best way to handle that long term.

btw, you may be tempted to use ('Ш' asciiValue) in place of 1096 in the 
code, but maybe(I'm not sure) there is a problem saving an image 
containing Unicode characters.

Maybe String's class variables CaseInsensitiveOrder & CaseSensitiveOrder 
would be better handled as individual classes
to provide flexibility for other sort orders like 
CaseInsensitiveGermanPhonebook [1] and probably 
String>>findString:startingAt:caseSensitive: should double-dispatch
and be overriden by WideString.

cheers -ben

[1] http://userguide.icu-project.org/collation
[2] http://www.w3.org/International/wiki/Case_folding
[3] http://cldr.unicode.org/index/cldr-spec/collation-guidelines







More information about the Pharo-users mailing list