[Compiler] A Small experiment with Syntax Errors

MD
Marcus Denker
Thu, Dec 8, 2022 9:56 AM

This is a small experiment about syntax errors when evaluating code. With #evaluate:, we can compile and run a DoIt, for example:

Smalltalk compiler evaluate: '1+2'

Imagine you evaluate a String, but it is actually not syntactically correct:

Smalltalk compiler evaluate: '1+'

We get an exception (while parsing), a SyntaxErrorDebugger is opened:

This tool is not used often and is fairly odd, in practice we use it for two things: Look at the error message and use the (not easy to find) menu "debug calling process" to see where the eval was called to understand what went wrong (why we consructed a string of invalid code). The problem is that the tool is not easy to understand, and it is not easy to maintain as part of the debugger framework (it is a bit broken even now, especially wrt to editing and continue, which does not matter for DoIts, though).

Could we no do better?

Evaluating code means:

  • We parse the string to an AST
  • we translate the AST to a DoIt Method
  • we execute that method

The error happens inside the parse phase: no method gets contructed, intead an exception is raised. There is quite some machinary to allow re-starting parsing after the error is fixed
(see the exception ReparseAfterSourceEditing and how it is used, quite hard to understand).

The idea that syntax errors are hard errors is quite natural. With batch compilers, there is some effort done to do some auto-repair. This was very useful in old times (punch cards!) to avoid seeing an aborted job the next morning just due to a missing comma. These days it is used to allow better error messages. But it is common to treat syntax errors as hard errors. This is seen as good as they are detected very early and the fact that compiling is not possible is seen as the goal for any kind of error. In the end, type systems try to extend this to non-syntactic problems: If a program is wrong, it should not run. If we call a method that does not exist, the program should not compile.

What if we not follow this standard advice and go the exact opposite direction?

Let's try:

Our tools parse code after every keystroke. The resulting AST is used for syntax highlighting. Which means, we can parse the expression

'1+’

without any problem, if we use the same mode for the Parser:

Smalltalk compiler
options:  #(+ optionParseErrors);
parse: '1+'

We see that the parser created an AST, with a ParseErrorNode as the argument of the message send +.

This AST would used by the Syntax HighLighter, and for code completion. But it is in the end just the normal AST that the Compiler uses to compile source text to CompiledMethods.

So can we not just implement the code in the compiler to compile this AST?

The node implements the visitor pattern #acceptVisitor:, which calls visitParseErrorNode:. So we can add support to the compiler by just implementing OCASTTranslator>>#visitParseErrorNode:

visitParseErrorNode: anErrorNode
methodBuilder
pushLiteralVariable: RuntimeSyntaxError binding;
pushLiteral: anErrorNode;
send: #signalSyntaxError:

This just reads the global Variable RuntimeSyntaxError (which is subclass of Error), pushes the error node itself as an argument and sends #signalSyntaxError:

signalSyntaxError: aNode
"we use signalSyntaxError: instead of signal: so we can quickly check
compiledMethods for syntax errors by checking the literals"
^(self new errorNode: aNode) signal

To see what the compiler compiled for that expression, lets compile a DoIt method (this is what #evaluate: does to get a method to execute):

compiledMethod := Smalltalk compiler
	noPattern: true;
	options:  #(+ optionParseErrors);
	compile: '1+’

So it did compile this bytecode, as expected:

49 <51> pushConstant: 1
50 <10> pushLit: RuntimeSyntaxError
51 <21> pushConstant: RBParseErrorNode()
52 <92> send: signalSyntaxError:
53 <60> send: +
54 <5C> returnTop

To finish what #evaluate does, we can evaluate the compiledMethod (with a nil receiver):

compiledMethod valueWithReceiver: nil arguments: #()

and it works! The standard debugger opens, on the RunTimeSyntaxError:

Can we do that by default for #evaluate? Go to OpalCompiler>>#evaluate and add the optionParseErrors line before compiling:

...
options:  #(+ optionParseErrors);
doItMethod := self compile.
...

If you now run

Smalltalk compiler evaluate: '1+'

It will open the normal debugger.

What needs to be improved?

  • The error message is not shown other than in the debugger window title.
    (but all the infos to print a nice error in place is contained in the AST)
  • Maybe allowing to edit and proceed could be interesting (not that much for evaluate)

(I will turn this into a blog post later)

Marcus
This is a small experiment about syntax errors when evaluating code. With #evaluate:, we can compile and run a DoIt, for example: Smalltalk compiler evaluate: '1+2' Imagine you evaluate a String, but it is actually not syntactically correct: Smalltalk compiler evaluate: '1+' We get an exception (while parsing), a SyntaxErrorDebugger is opened: This tool is not used often and is fairly odd, in practice we use it for two things: Look at the error message and use the (not easy to find) menu "debug calling process" to see where the eval was called to understand what went wrong (why we consructed a string of invalid code). The problem is that the tool is not easy to understand, and it is not easy to maintain as part of the debugger framework (it is a bit broken even now, especially wrt to editing and continue, which does not matter for DoIts, though). Could we no do better? Evaluating code means: - We parse the string to an AST - we translate the AST to a DoIt Method - we execute that method The error happens inside the parse phase: no method gets contructed, intead an exception is raised. There is quite some machinary to allow re-starting parsing after the error is fixed (see the exception ReparseAfterSourceEditing and how it is used, quite hard to understand). The idea that syntax errors are hard errors is quite natural. With batch compilers, there is some effort done to do some auto-repair. This was very useful in old times (punch cards!) to avoid seeing an aborted job the next morning just due to a missing comma. These days it is used to allow better error messages. But it is common to treat syntax errors as hard errors. This is seen as good as they are detected very early and the fact that compiling is not possible is seen as the goal for any kind of error. In the end, type systems try to extend this to non-syntactic problems: If a program is wrong, it should not run. If we call a method that does not exist, the program should not compile. What if we not follow this standard advice and go the exact opposite direction? Let's try: Our tools parse code after *every* keystroke. The resulting AST is used for syntax highlighting. Which means, we can parse the expression '1+’ without any problem, if we use the same mode for the Parser: Smalltalk compiler options: #(+ optionParseErrors); parse: '1+' We see that the parser created an AST, with a ParseErrorNode as the argument of the message send +. This AST would used by the Syntax HighLighter, and for code completion. But it is in the end just the normal AST that the Compiler uses to compile source text to CompiledMethods. So can we not just implement the code in the compiler to compile this AST? The node implements the visitor pattern #acceptVisitor:, which calls visitParseErrorNode:. So we can add support to the compiler by just implementing OCASTTranslator>>#visitParseErrorNode: visitParseErrorNode: anErrorNode methodBuilder pushLiteralVariable: RuntimeSyntaxError binding; pushLiteral: anErrorNode; send: #signalSyntaxError: This just reads the global Variable RuntimeSyntaxError (which is subclass of Error), pushes the error node itself as an argument and sends #signalSyntaxError: signalSyntaxError: aNode "we use signalSyntaxError: instead of signal: so we can quickly check compiledMethods for syntax errors by checking the literals" ^(self new errorNode: aNode) signal To see what the compiler compiled for that expression, lets compile a DoIt method (this is what #evaluate: does to get a method to execute): compiledMethod := Smalltalk compiler noPattern: true; options: #(+ optionParseErrors); compile: '1+’ So it did compile this bytecode, as expected: 49 <51> pushConstant: 1 50 <10> pushLit: RuntimeSyntaxError 51 <21> pushConstant: RBParseErrorNode() 52 <92> send: signalSyntaxError: 53 <60> send: + 54 <5C> returnTop To finish what #evaluate does, we can evaluate the compiledMethod (with a nil receiver): compiledMethod valueWithReceiver: nil arguments: #() and it works! The standard debugger opens, on the RunTimeSyntaxError: Can we do that by default for #evaluate? Go to OpalCompiler>>#evaluate and add the optionParseErrors line before compiling: ... options: #(+ optionParseErrors); doItMethod := self compile. ... If you now run Smalltalk compiler evaluate: '1+' It will open the normal debugger. What needs to be improved? - The error message is not shown other than in the debugger window title. (but all the infos to print a nice error in place is contained in the AST) - Maybe allowing to edit and proceed could be interesting (not that much for evaluate) (I will turn this into a blog post later) Marcus