[Pharo-dev] Alt-Space is Non-breaking S

Henrik Johansen henrik.s.johansen at veloxit.no
Thu May 15 05:33:47 EDT 2014

On 13 May 2014, at 8:05 , Eliot Miranda <eliot.miranda at gmail.com> wrote:

> On Tue, May 13, 2014 at 4:08 AM, Henrik Johansen <henrik.s.johansen at veloxit.no> wrote:
> … Which is extremely annoying when writing blocks with arguments on a norwegian OSX keyboard.
> | is Alt - 7, when I don’t release the Alt key before the following space, and a NBSP is inserted instead of a space character, which breaks both the parser and syntax highlighting.
> When doit’ing, you get the syntax error:
> [ :a |  Unknown character ->a ] (notice also the off-by-one indicator, it’s not the a, but the preceding space which is unknown).
> Would it be better if NBSP were treated equally to a normal space by the parser, or is there some keyboard binding I can revert?
> I think getting the parser to accept NBSP is a really bad idea.  If you want to export code to other dialects you can't rely on that happening.  WTF is an accellerator key doing introducing an illegal sequence in the first place?  Surely the key could insert a legal sequence?

Hmm, it seems Alt-Space is mapped to NBSP by OSX itself, and has been for a long time, I guess I just haven’t noticed before for some reason…

I agree accepting source with NBSP would be bad, I was thinking maybe have the parser substitute them with normal spaces instead of raising errors.
Seems to me (at least in the RBParser of 3.0,) that might be viable, there’s an error handling block (in RBParser >> #parserError:) which could possibly do it.

As for the PS, it seems to me the off-by-one is caused by:
scanError: theCause
	currentCharacter ifNotNil: [ :char | buffer nextPut: char ].
	^ RBErrorToken
		value: buffer contents asString
		start: tokenStart
		cause: theCause
		location: stream position + 1

Now, I don’t mind the change that introduced a separate ErrorToken, I’m more ambiguous about switching from using tokenStart to the stream’s position  as the error’s location though.

In 99% of cases, the token of a syntax error will be a single character (if you write #was£ for example, #was will parse as a valid token before £ gives an unknown character error), the one exception I can think of is missing string literal/comment quotes, where location will be the end of the method, and token start will be at the unmatched quote. Though, in that case, isn’t it better to display the Unmatched message *at* the unmatched quote itself, rather than at the end of method?

Is there cases I’ve overlooked where the current location is better, or should RBErrorToken be changed back to just use tokenStart as the errorLocation? (Which would also simplify RBParser >> #parserError:)


PS. Also, with an error token that includes a cause, does it ever make sense to use the string parameter to parserError:?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20140515/beb31f22/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.pharo.org/pipermail/pharo-dev_lists.pharo.org/attachments/20140515/beb31f22/attachment.asc>

More information about the Pharo-dev mailing list