Branching rules

Last time I talked about synthesis and IL generation. Currently, half of the statements already write valid byte code. I have now completed the branching infrastructure that I mentioned in my previous blog post, and most conditional structures and loops are now prepared – including the If, While, and Until structures. The For-Next loop and the Select-Case structure are next. But before going there I decided to wrap up a quick summary about branching rules and what they mean in terms of code compilation.

First of all, there’s a mechanic in place that eliminates unreachable code. For example, if you had an If condition that has a constant expression that always evaluates to false, the compiler knows this during compile time and will not write the byte code for that block. Similarly, if you had a While loop that is known to be always true the compiler doesn’t write the byte code for the expression at all. Code elimination also applies to nested code blocks, meaning that everything inside an unreachable code block is ignored when the byte code output is written. Next, let’s look at the specs:

The If structure

  1. The If and ElseIf statements must know the next ElseIf/Else statement in order to branch when the expression is false
  2. The If, ElseIf and Else statements must know the EndIf statement that is associated with the condition chain; The If statement needs it in order to branch when the expression is false, and the ElseIf and Else statements need it in order to issue an unconditional jump to indicate the end of the previous block
  3. The ElseIf and Else statements must add an unconditional jump to the corresponding EndIf statement before any other byte code unless the statement in question is and ElseIf with a constant value expression of false
  4. If the If/ElseIf expression is a constant value, don’t write expression byte code
  5. If any of the If or ElseIf blocks evaluated to a constant expression of true, don’t write byte code for any of the subsequent blocks in the same chain
  6. If an If block or an ElseIf block has a constant expression of false, don’t write its byte code. This includes nested code blocks

In addition to the rules listed above there are some minor tweaks that accomplish cleaner byte code output; I’ve eliminated some branching instructions that are not needed, for example.

The Repeat-Until structure

  1. The Until and Forever statements must know the corresponding Repeat statement in order to branch to the correct location
  2. If the Until statement’s expression is a constant value of true, do nothing
  3. If the Until statement’s expression is a constant value of false, add an unconditional jump
  4. The Forever statement just adds an unconditional jump

The While-EndWhile structure

  1. The While statement needs to know the corresponding EndWhile statement in order to branch when the expression is false
  2. The EndWhile statement needs to know the corresponding While statement in order to issue an unconditional jump
  3. If the While statement’s expression is a constant value of false, don’t write byte code for the expression or code block
  4. If the While statement’s expression is a constant value of true, don’t write the expression’s byte code
  5. The EndWhile statement only adds an unconditional jump to the While statement

Controlling loops

The old “Exit” statement is now renamed as “Break”. It exits the Repeat, While, For, and Foreach loops, and continues execution from after the loop.

Similarly, a new loop control statement has been added. The “Continue” statement continues the encapsulating loop from its next iteration pass, but doesn’t exit the loop unless the associated condition dictates so.

The implementation of these is quite straight-forward; The Break statement needs to know the loop’s ending statement, and the Continue statement needs to know the loop’s starting statement. Both issue an unconditional jump.

I haven’t yet implemented these, and will retain doing so until I get the For-Next structure done. I might write a similar blog post about the For-Next structure and Select-Case structure next time…

Moving from analysis to synthesis

Now that I’m on my summer vacation, I’ve had some time to return to the CoolBasic Classic compiler again (after a month or two of slacking, pardon me), and I’m happy to say that the analysis phase of the compilation process is now basically done. After code analysis the compiler has all the information it needs in order to produce the final byte code output. This phase is called “synthesis”, and it executes by recursively iterating through all statements based on their scope. This approach enables the compiler to omit byte code generation for unreachable code blocks such as If/ElseIf/While/Case blocks that have a constant expression that evaluates to false. Moreover, we could have those user-defined Functions and Subs who aren’t used anywhere in code not to be included to the output IL at all. However, it’s important to process all those code blocks fully even though they wouldn’t get written to IL because we want the compiler to validate the entire source code.

Some of the statements are already written into the output byte code stream such as the assignment statement and a Sub call. Similarly to function calls, the compiler inserts any omitted optional parameters and injects any type conversion operators that might be needed. There are still dozens of statement types to process, but all of them should be quite simple to implement. I decided to tackle the If/ElseIf/Else structure next because it requires some branching infrastructure I need to develop first. Once that’s sorted out it’s easy to implement other program flow control statements such as the While and Until loops.

Branching i.e. jumping is an interesting topic on its own because it’s a common key element and therefore needs to be very efficient. Labels are local to their enclosing Function or Sub symbol, and also the Root (the main program) has its own label “scope”. This means that the programmer can have identically named labels as long as they exist in a different “stack fragment”. The Root/Function/Sub symbols encompass a dictionary of labels, and they’re populated already in the parsing phase so that all labels are known during synthesis. The compiler can therefore match forward jumps without expensive scanning. On the other hand, this technique only applies to GoTo statements – many other statement types need to generate some jump targets.

Statements are processed in the natural program flow order during synthesis. For example, an If statement is processed before the corresponding EndIf statement, but the If statement in question still needs to know where the EndIf statement is in order to jump to the proper location if the expression is false. The compiler needs to link all cognate If/ElseIf/Else/EndIf statements together so that they can access the next member in the chain as well as the final EndIf. For branching to work, the compiler stores the byte code pointer of each statement (labels are considered statements as well even though they don’t generate any IL) when they get iterated by the synthesizer. Of course the If statement can’t know the final address of the EndIf statement upon processing because the EndIf doesn’t yet have the calculated offset, i.e. forward jumps cannot be determined fully at this point.

To solve this problem, all jumps (apart from GoTo statements) are cached in a special list that consist of tuples of an IL jump instruction and the target statement. Just before the IL is physically written to a file the compiler simply iterates this list and fills in the correct jump instruction addresses with the calculated code offsets found in the completely processed statements.

We’re quite close from having a working compiler prototype for internal testing 🙂

Copyright © All Rights Reserved · Green Hope Theme by Sivan & schiy · Proudly powered by WordPress