Analysing the syntax

The compiler is currently at a stage where syntax of program basic structures need to be validated. This pass involves parsing and data gathering, from which the first refers to syntax cheking of classes/functions/constants etc, and the latter is basically forming the Token Table for the program. There are 9 data structures to do: constants, enumerations, program flow, jumps, structures, variable definition, functions, operators and classes.

There is no “properties” in this list. While this is something that would be exciting to implement, I’ve decided not to – just yet. Properties are part of classes that involve usually two functions: getters and setters for certain attributes. Traditionally programmers used to have these two separate functions, but it’s more elegant to have normal assignments and values in spite of they actually perform more code under the hood. You could use something like this:

myClassVariable.SetMyPropertyValue(myValue)
variable = myClassVariable.GetMyPropertyValue()

However, the following way is cleaner:

myClassVariable.myProperty = value
variable = myClassVariable.myProperty

Both of these would still perform a function within the class. I’ll keep this item on my TODO-list, but let’s try to establish the core first 🙂

Back to the analysing process. So those 9 keyword statements need to be parsed in order for CoolBasic to be able to compile a list of all token names. Naturally, each token must have additional data assigned to them such as access modifiers, inheritance modifiers, overriding modifiers and certain flag values. I don’t want to attach all these fields to every token data because not all of them is never needed at the same time and this would only munch unneeded memory for large programs. So it’s quite challenging task to create any “optimal” way to store this kind of data that could bend to all needs at once.

The programming tool I’m using to create CoolBasic V3 compiler doesn’t really support classes and inheritance properly (woot! why can’t you just use Visual Studio?! – It’s not fast enough ^^). I’m using allocated memory blocks to store only the most relevant data. This will make my code somewhat uglier than I had hoped, but at least it’s memory efficient. Note to self: Just remember to maintain the clean-up functions accordingly, mmkay…

And finally, some good news. The consants are already done 🙂

Managed inheritance

One of the coolest new features of CoolBasic V3 will be classes and their inheritance. In this blog entry, I’m not going into details nor will I summary exactly what aspects belong to it in CoolBasic’s case (I’ll leave that for later), but I’d like to share some conclusions on how to perserve data integrity through inheritance.

CoolBasic will support perfect hierchical inheritance of multiple classes. As in Visual Basic .NET, there will be “Me” and “MyBase” keywords which point to the current class and the parent class the current class was derived from. By default, overloaded functions between the derived and parent classes reference primarily to “local scope”. In order to specifically call the parent class’ functions “MyBase” keyword must be used in conjunction with the member access operator, for example “MyBase.MyFunction()”.

ChaosBasic (read more at CoolBasic forums) introduced a syntax which allowed multiple inheritances when a new class is created. Parent classes were separated by commas such as: “Class OpacityCow Inherits Cow, OpacityObject”. Now the question is… where does MyBase point to? How do you access the parents’ constructors? There’s obviously a design flaw with this concept, so I decided not to implement multiple inheritances in this way. Instead, much more coherent way to approach this problem, would be to only allow one inheritance per class. If the programmer wished to link multiple classes via inheritance it’d needed to be done like a->b, b->c.

In VB.NET, C#, and Java access is resticted to the direct base class to prevent inconsistencies in an object’s state. That’s why MyBase.MyBase does not exist. Some languages, however, do give access to traversing underlying classes… i.e. “::”. I haven’t decided yet whether or not I should allow this. For those who are interested in knowing more about the pros and cons of grand parent reference, I recommend visiting http://bytes.com/forum/thread372516.html.

This “safety net” applies to class constructors as well; CoolBasic V3 will force you to call the parent’s constructor in the derived class’ constructor. This ensures that all constructors are called when a new instance is created which in turn will result in complete and “well-formed” class instance.

Comment parsing

A few things about source code comments worth mentioning. First of all, both singleline comments and multiline comments will still exist in V3. Comments will be written pretty much the same way they’re written in C (and the related) languages. That is, singleline comments beginning with double slashes “//”, and multiline comments introduced as block that starts with “/*” and ends with “*/”. No more remstart/remend.

For now, multiline comments are stripped already in the file reading pass, and not in the lexical phase (singleline comments are handled here, though). I thought it’d be more efficient in terms of compilation time if lines that wouldn’t be processed anyway, were ignored completely instead of going through the normal tokenization. Due to this, comment ending and starting can’t occurr within a logical line. Obviously this would be easily fixed, but it’s extra work and some code recycling – which is rarely a good thing. This means that the “/*” and “*/” must be on their respective lines alone in a similar manner to how remstart and remend currently work. It may require some time to get used to this, but as it stands now, my decision is justifiable.

Optimizing and pre-evaluating

Many present compilers try to simplify mathematical expressions during compile time in order to ease the burden of the final executable. Naturally this results in faster runtime code execution in some cases and even the size of the executable may shrink a bit. I’ve been working on some similar mechanics for CoolBasic V3 compiler, and at this point in time I’ve grown pretty satisfied with it. Most constant expressions will now be pre-calculated, meaning that statements get shorter before the actual syntax checks begin. All of this will help me to establish as bug-free development as possible. Compilation based on very very strict rules, in the end, tends to prevent exceptions in the algorithms. Simplicity is beautiful.

This kind of optimization is, however, very hard to carry out perfectly. An an example of this matter would be the following expression to optimize well:

1 + 2 + a [into] 3 + a

But grouping is a bit trickier:

1 + a + 2 [into] 3 + a

For now, I decided to leave as it is i.e. only the first example remains in effect.

Another funny part of this implementation of pre-evaluation is that the compiler’s source code almost doubled just by adding this feature. You can’t really say it wasn’t worth it, though. As a result, you gain:

  • Faster runtime execution
  • Faster compilation
  • Smaller executable
  • False conditions (like If/Until/While 0) can be stripped from the final executable

Copyright © All Rights Reserved · Green Hope Theme by Sivan & schiy · Proudly powered by WordPress