Moving from analysis to synthesis
Now that I’m on my summer vacation, I’ve had some time to return to the CoolBasic Classic compiler again (after a month or two of slacking, pardon me), and I’m happy to say that the analysis phase of the compilation process is now basically done. After code analysis the compiler has all the information it needs in order to produce the final byte code output. This phase is called “synthesis”, and it executes by recursively iterating through all statements based on their scope. This approach enables the compiler to omit byte code generation for unreachable code blocks such as If
/ElseIf
/While
/Case
blocks that have a constant expression that evaluates to false. Moreover, we could have those user-defined Functions and Subs who aren’t used anywhere in code not to be included to the output IL at all. However, it’s important to process all those code blocks fully even though they wouldn’t get written to IL because we want the compiler to validate the entire source code.
Some of the statements are already written into the output byte code stream such as the assignment statement and a Sub
call. Similarly to function calls, the compiler inserts any omitted optional parameters and injects any type conversion operators that might be needed. There are still dozens of statement types to process, but all of them should be quite simple to implement. I decided to tackle the If
/ElseIf
/Else
structure next because it requires some branching infrastructure I need to develop first. Once that’s sorted out it’s easy to implement other program flow control statements such as the While
and Until
loops.
Branching i.e. jumping is an interesting topic on its own because it’s a common key element and therefore needs to be very efficient. Labels are local to their enclosing Function or Sub symbol, and also the Root (the main program) has its own label “scope”. This means that the programmer can have identically named labels as long as they exist in a different “stack fragment”. The Root
/Function
/Sub
symbols encompass a dictionary of labels, and they’re populated already in the parsing phase so that all labels are known during synthesis. The compiler can therefore match forward jumps without expensive scanning. On the other hand, this technique only applies to GoTo
statements – many other statement types need to generate some jump targets.
Statements are processed in the natural program flow order during synthesis. For example, an If
statement is processed before the corresponding EndIf
statement, but the If
statement in question still needs to know where the EndIf
statement is in order to jump to the proper location if the expression is false. The compiler needs to link all cognate If
/ElseIf
/Else
/EndIf
statements together so that they can access the next member in the chain as well as the final EndIf
. For branching to work, the compiler stores the byte code pointer of each statement (labels are considered statements as well even though they don’t generate any IL) when they get iterated by the synthesizer. Of course the If
statement can’t know the final address of the EndIf
statement upon processing because the EndIf
doesn’t yet have the calculated offset, i.e. forward jumps cannot be determined fully at this point.
To solve this problem, all jumps (apart from GoTo
statements) are cached in a special list that consist of tuples of an IL jump instruction and the target statement. Just before the IL is physically written to a file the compiler simply iterates this list and fills in the correct jump instruction addresses with the calculated code offsets found in the completely processed statements.
We’re quite close from having a working compiler prototype for internal testing 🙂