Moving from analysis to synthesis

Now that I’m on my summer vacation, I’ve had some time to return to the CoolBasic Classic compiler again (after a month or two of slacking, pardon me), and I’m happy to say that the analysis phase of the compilation process is now basically done. After code analysis the compiler has all the information it needs in order to produce the final byte code output. This phase is called “synthesis”, and it executes by recursively iterating through all statements based on their scope. This approach enables the compiler to omit byte code generation for unreachable code blocks such as If/ElseIf/While/Case blocks that have a constant expression that evaluates to false. Moreover, we could have those user-defined Functions and Subs who aren’t used anywhere in code not to be included to the output IL at all. However, it’s important to process all those code blocks fully even though they wouldn’t get written to IL because we want the compiler to validate the entire source code.

Some of the statements are already written into the output byte code stream such as the assignment statement and a Sub call. Similarly to function calls, the compiler inserts any omitted optional parameters and injects any type conversion operators that might be needed. There are still dozens of statement types to process, but all of them should be quite simple to implement. I decided to tackle the If/ElseIf/Else structure next because it requires some branching infrastructure I need to develop first. Once that’s sorted out it’s easy to implement other program flow control statements such as the While and Until loops.

Branching i.e. jumping is an interesting topic on its own because it’s a common key element and therefore needs to be very efficient. Labels are local to their enclosing Function or Sub symbol, and also the Root (the main program) has its own label “scope”. This means that the programmer can have identically named labels as long as they exist in a different “stack fragment”. The Root/Function/Sub symbols encompass a dictionary of labels, and they’re populated already in the parsing phase so that all labels are known during synthesis. The compiler can therefore match forward jumps without expensive scanning. On the other hand, this technique only applies to GoTo statements – many other statement types need to generate some jump targets.

Statements are processed in the natural program flow order during synthesis. For example, an If statement is processed before the corresponding EndIf statement, but the If statement in question still needs to know where the EndIf statement is in order to jump to the proper location if the expression is false. The compiler needs to link all cognate If/ElseIf/Else/EndIf statements together so that they can access the next member in the chain as well as the final EndIf. For branching to work, the compiler stores the byte code pointer of each statement (labels are considered statements as well even though they don’t generate any IL) when they get iterated by the synthesizer. Of course the If statement can’t know the final address of the EndIf statement upon processing because the EndIf doesn’t yet have the calculated offset, i.e. forward jumps cannot be determined fully at this point.

To solve this problem, all jumps (apart from GoTo statements) are cached in a special list that consist of tuples of an IL jump instruction and the target statement. Just before the IL is physically written to a file the compiler simply iterates this list and fills in the correct jump instruction addresses with the calculated code offsets found in the completely processed statements.

We’re quite close from having a working compiler prototype for internal testing 🙂

Comments are closed.

Copyright © All Rights Reserved · Green Hope Theme by Sivan & schiy · Proudly powered by WordPress