Compiler part 1/2 is done

One another big part of the CoolBasic Classic Compiler is now complete. The CB classic compiler is a standard 2-pass compiler, meaning that it consists of two major sweeps over the code in its transformation from textual representation into a byte code (the binary form is then consumed by the Cool VES runtime and game engine). There’s but one purpose for the first pass; It parses all code lines and creates lists of symbols, like functions and variables. It also picks all tokens that form expressions. Expressions are used in statements such as If, While, and assignment. We chose the 2-pass approach so that, for example, functions don’t need to be declared “above” any statements that use them. The first pass is thus essentially a gathering phase, and that is now complete. The current status is illustrated in the image below:

CoolBasic Classic Compiler - status 3/2011

Green: done
Yellow: in the progress

Personally speaking, writing parsers for 40 different kinds of statements was a little repetitive and mechanical work (yes, we use specialized parsers instead of a state table and stack), but the boring part is now done. I’d say we’re definitely getting somewhere here. There are already 20,431 lines of C# code in 270 files, so the compiler project alone is pretty huge.

CB Comic #4

CB Comic #4

Introducing KilledWhale (the smartass).

Experimental file diff

Before we begin, I’d like to say that the feature demonstrated herein will NOT be part of the first release of Cool Developer and CoolBasic Classic, but I’m merely playing with a cool idea instead.

I’ve been interested in versioning and Source Control technologies in general lately. One part of version controlling is the ability to handle conflicts when two or more developers check out a file for editing, do their changes and then check them back in. Two simultanous check-outs will often result in a conflict within the source file, and these conflicts have to be resolved through merging. Merging means that both developers’ modifications and fixes are applied to the source file, without the other’s changes getting lost in the process. Merging is one thing, but analyzing the differences is another. I decided to try out some difference tracking techniques.

First of all, there are number of algorithms available. Some are easier to understand than others, and some support more features (such as the ability to detect moved lines and code blocks). The most popular method I found was the Longest common subsequence problem, and it is, in fact, used in most Version Control software. It’s a good method of producing a script that can be used to “patch” the destination file with tiny changes so that it’ll transform to that of the source file. Basically you get a set of “add this to place X” and “remove this from place Y“. Unfortunately, while this technique indeed gets the job done nicely, visualizing these kinds of changes can be confusing. When a conflict occurs, for example, determining whether the merge would break something, can be tricky.

In addition, due to how LCS works, comparing two very big files (10,000 lines or more) quickly starts to eat giant chunks of memory because the algorithm operates with a matrix of n*m, where n is the number of code lines within the first file, and m is the number of code lines within the second one. There are a number of ways how to optimize memory consuption and matrix size, but that is out of scope of this blog bost.

Ideally, when a developer encounters a conflict (and a merge needs to be done), he or she will be provided with two source files side-by-side; one from the server and the other is the local file. This view should visually present those parts that differ between the two versions. If the developer spots a problem that would break the code upon auto-merge, there should be a way to edit the file before committing. LCS cannot by itself provide good enough visual presentation because it can only tell insertions and deletions, but not modifications. For example, editing a single line would produce both “delete” and “addition” changesets.

Then I came across with this Patience Diff algorithm. All in all, it would seem to fit to the purpose perfectly – it offers a very nice and clear presentation of changesets between two files and doesn’t take an awfully lot of memory either. I spent hours and hours trying to find a .NET implementation of it, but as of yet there apparently just aren’t any (the number of “good” C# implementations of the other algorithms is substantially small aswell). So I started working with a proof-of-concept. I spent a little less than 2 days on this challenge, and finally came up with a working solution. When I’m writing this, my work is the only C# implementation of the Patience Diff algorithm that I know of πŸ™‚ . As a side note, there’s seemingly only one Version Control product that uses Patience Diff as its main tool, Bazaar, but apparently one can optionally enable it for Git aswell.

Here’s a sample of two slightly different files: the left side represents the original document, whereas the edited version is on the right:

Different files

The following difference graph can be generated from them (green = insert, red = delete, yellow = modify):

Difference analysis

So what does this mean for CoolBasic? I don’t really know yet, but I have some ideas πŸ™‚

Compiler analysis techniques

The CoolBasic Classic compiler has progressed again, and this time I’d like to share some interesting new features about its internal architecture. Using C# enables me to do certain things much more easily than the old procedural approach and the Object Oriented design really starts to kick in. The new compiler is highly structural, and it now records much more data about declared symbols, scopes, and execution paths. This also serves as a good foundation for some code analysis techniques. For example, the compiler will emit warnings for variables, types, and functions that are declared but never used, as well as if a variable is used before it has been initialized. Warnings like these will encourage the users to write better and cleaner code.

The original CoolBasic has almost no optimizations regarding the generated byte code. This is now different: Constant value analysis can ignore code branches (such as If/Else) if the condition can be evaluated to True or False at compile time. Thus, redundant code will not even make it to the final compiled program. Also, since we have constant value pre-evaluation, some expressions will be simplified before conversion to byte code. This ought to improve the runtime performance. In addition, thanks to internal scope-specific dictionaries, resolving branch targets does not produce linear search to all recorded labels (user-defined or generated). This should improve compile time performance greatly for large programs.

We’ll see how far we end up going with code analysis in the future… I’d love to collect data such as Cyclomatic Complexity or Number of Executable Statements in order to derive a general Maintainability Index out of the given source code.

Local scoping
All scopes now have their own list of local variables. This means that the user can declare variables in a code block such as If, For, or Case. These variables are allocated when the execution begins in that block and they will cease to exist after the execution exits the scope. Therefore, it is possible to declare several variables that have the same name as long as they aren’t conflicting in an enclosing block. Consider the following example:

Dim a As Integer

If a = 1 Then
    Dim b As Integer = 2
    // Variable 'b' is only visible within this If block.
	
    // You cannot declare variable 'a' here because it is already declared 
    // outside.
EndIf

If a = 1 Then
    Dim b As Integer = 3
    // Variable 'b' is visible within this If block and any child blocks.
    // Variable 'b' is different than the one declared in the previous 
    // If block.
	
    For i As Integer = 1 To 10
        // Variable 'i' is only visible within this For block.
		
        // Varible 'b' is also visible here since it was declared 
        // in an enclosing block.
        b = b + 1
    Next
EndIf

It is also possible to declare scope specific constants in the same way. Upon entering a scope the Runtime will ensure that all of its local variables are initialized with zero. Similarly, local arrays and strings can now be freed upon leaving the scope.

Short-circuit And/Or
We’ll also change the way how the Boolean And and Or operators work. They have become ‘short-circuit’, meaning that if the end result of the Boolean operation can be determined by just evaluating the first (left) operand, then the program will not evaluate the right half at all. Again this will improve runtime performance, but users need to pay close attention to their code if the right side has any function calls that need to be executed regardless of the end result of the Boolean operation. This is easily fixed, though, by storing the right side expression into a temporary variable first.

Hello 2011!

Okay… where to start. The year changed. Gosh, that went fast. I can remember me writing tons of stuff and assembling the DevTeam 12 months back like it was yesterday. Obviously we didn’t make it in 2010 like originally planned, but the CoolBasic Classic project continues still. Actually, I just went ahead and changed the year from 2010 to 2011 on coolbasic.com (…and yes, it’s a common joke amongst project managers that deadlines should indicate month, but intentionally leave the year unspecified πŸ™‚ ). I’ve been a little slacky the past few weeks, I admit, but it’s been hectic at work and other members of the Team have had all kinds of stuff going on in their lives (who wouldn’t). It being a year now, the DevTeam members’ contracts are closing to an end soon, naturally. This means it’s time for a round of renewals! That being said, few people are leaving their duties due to a number of things, but the majority has expressed their interest of continuing in the Team. And that’s great! For now, we have decided not to open new positions, and will recruit more only when needed – who knows, perhaps some of the original members will make a come-back. The organization structure is going to change a little bit, and details will be published later.

So what’s been done since last time… well, most of it has been management oriented. I feel it’s most convinient for you if I just list the things:

  • We have explored alternatives to SVN version control, like Mercurial
  • We actually purchased a Virtual Server for web hosting
  • We now have domain coolbasic.fi (and it’s already operational on the Virtual Server)
  • We have configured PHP and MySQL databases
  • We have explored ways to use LDAP authentication services
  • We have explored ways to improve forum experience through measures in order to shut down spam
  • The PureBasic accounts have been terminated since we now use other tools

The web host isn’t fully configured yet, but we’re working on it. When that’s done, we can set up a true testing environment for web developing (full copy of the portal, forums, everything).

I saved some money for not having to purchase a new set of PureBasic one-year licenses, but then again… I did purchase the Virtual Server, and it costs roughly the same. No gain there. In addition, I just recently purchased a license of Resharper which was around 200 €. I use it together with StyleCop at work, so I’m quite used to it. Coding without these essential tools feels somehow awfully wrong nowadays. Resharper is an extension for Visual Studio that provides advanced code analysis and refactoring tools that will help you write better code that is styled and constructed according to “best practises”. It helps to improve code readability and maintainability, and overall I think the gain is notable. I’ll be spending at least a few days refactoring my existing code now…

Fortunate me, I’ve had a chance to experiment some of my compiler specific ideas at work as well (I’m working on a project where I am able to use this expertise of mine), so even though not much code have been added to the CBC compiler project for the past 2 months, I, in fact, have done something useful that will help me achieve my goal in the CoolBasic project (I proved some techniques and theories work in practise).

Until next time…

Got it working with Linux!

One of the to-be-implemented-next features of the Cool Developer editor is to get it perform an actual “Build & Run” action using the CoolBasic Classic compiler and the Cool VES linker. Once this is done it’s obviously possible to start coding in CoolBasic (although the runtime and compiler aren’t fully working yet). Anyways, to get this wheel rolling I prepared a dummy compiler and a dummy linker executables that can be used for to simulate a Build process. To hook ’em up I also prepared a CoolBasic Classic Cool VES game project template that Cool Developer consumes. The project template consists of a set of XML files that describe the allowed project items and how they’re presented in the editor. They also define how this project type needs to be built. Currently, the CoolBasic Cool VES project has two build steps: 1) compilation, and 2) linking. If one fails, the entire chain fails, and is also terminated immediately.

I mentioned “dummy executables”. In actuality, they’re the real CoolBasic Classic compiler and the real Cool VES linker, but they aren’t feature-complete yet. For example, the compiler only has lexical analysis and the linker only creates an empty (yet valid and runnable) executable. Both can be tested by intentionally providing invalid source (such as CB source with invalid parentheses or mismatched string quotation). This enables us to test error reporting and build step chaining properly back in Cool Developer.

I uploaded this package for the DevTeam a few days ago, however, it was only for Windows. I decided the sooner I can verify that it works on Linux as well the better. Finding out that you have to change half of your code when it’s time to deliver would such big-time. So I installed Linux on a VMWare virtual machine this weekend. I then went ahead and fetched Mono and set up the CoolBasic Classic compiler and Cool VES linker projects in MonoDevelop. They both compile out-of-the-box and seem to work properly. I was a bit surprised how easy it was. I have zero Linux experience. None, whatsoever. Now I just need to figure out how to share a folder (or something) so that my Windows 7 host and my Ubuntu can share stuff… or just link them both to SVN somehow. Lots of research work to do.

I haven’t yet gotten started with the Parsers, but let’s see what the forthcoming week produces.

Definition file implementation

Today’s post is a status update to CoolBasic Classic compiler. From the architectural aspect, most entities and interfaces are now implemented. The most notable ones include messages, symbols, keywords, operators, tokens and definition nodes. There are already almost 130 files in the C# project. Also the lexer is now fully operational so the first “major” part is done.

Late yesterday I finished the Cool Framework Definition File importer. What this means is that the compiler can now be made aware of Cool VES symbols such as functions and constants. For example, the constant “PI” is built-in to Cool VES in the same way as “KEY_ESC” or “COLOR_RED” will be in the future. I also tested how overloaded functions import, and that part is covered as well. Overall, the importer should be done now.

What about if the user wants to declare a function or constant of the same name as one already provided by the framework definition? For example, a user-defined function named “LoadObject” (which has the identical fingerprint with the framework version). Which would the program end up invoking? In such situation there’s two options: 1) Report compile error for ambiguous symbol, or 2) Resolve always to the user-defined symbol first. I don’t like the first option because it has the potential to break code if the framework changes (such symbol is added in the future, for instance).

One important thing to know about Name Resolution is that it bubbles up the tree. That is, if no match is found in the current context, query the search in its parent symbol’s context until a match is found or the entire tree has been processed. It never iterates the children of an upper level, though. In CoolBasic Classic the prime scope is the Root – all functions and the main program belong to the Root. In addition, there’s one more scope the Root belongs to, but to which the user has no access. It’s the Global scope, and that’s where the imported symbols go. Thus, Name Resolution will stop at the root level (user code) if the match is found, and will only proceed to the Global context if it wasn’t. Therefore, user-defined symbols will override any identical framework symbols. There’s one thing to note, though. If the signatures don’t match, but the names do, Name Resolution stops at the Root scope and will report possible compile error if no compatible signatures can be found (so it’s still possible for a framework update to break existing code, but only if the user intentionally tries to invoke the framework version).

The image below illustrates the current compiler status. Green boxes are considered “ready”.
CoolBasic Classic compiler status (2010-11-01)

The ‘Classic’ compiler in C#

First things first: I’d like to correct the wording in one of my previous post’s statements. I mentioned this “another compiler that didn’t turn out that good”. Some people are now under the impression that C# is to blame for the project in question being now in a frozen state. Maybe I phrased it poorly, but the fact that it was developed in C# at first, has little to do with its failure. The author clarified to me that he dropped C# and switched over to C++ just for the sake of re-writing, and not due to performance issues. Personally, I don’t buy that, but that doesn’t matter. What matters is that a lot of people are paying a lot of attention to what I actually say, and I should be more careful how to express my thoughts in this blog. So let me emphasize once and for all… the other compiler project was ‘successfully’ developed in C#, and later in C++, but is now paused for an undetermined time due to completely unrelated reasons. There. Case closed. He’s watching my blog very closely, though πŸ˜‰

And then onto other issues at hand. The new compiler (written in C#) is progressing nicely. In the beginning there’s naturally lots of plain setup to do just to get most of the needed entities created. It’ll get more interesting soon, though. For the past week, I’ve used approximately 4 to 6 hours almost every day after work to establish the new compiler solution. That’s a healthy 12-14 hours of programming a day πŸ˜‰ . Time flows easily when you listen to inspirational music. Currently, the compiler parses the command line arguments, initializes a build job, and invokes the lexer on the source file. Half of all token types are already recognized, and there’s also one dummy parser (which, for testing purposes, only prints all tokens to the console output). I expect to finish the lexical analysis on the remaining token types soon. And then I can start implementing the statement parsers – which ought to be interesting because I now have some new tools at my disposal, thanks to object oriented platform.

I decided to take a little different approach for error reporting. All old compilers shut down the process immediately after encountering the first compile error. This new design doesn’t do that, but the compiler creates a list of compile errors instead and prints them to the standard error stream. This allows Cool Developer editor to compose a neat listing the user can then iterate through and fix more issues at once before re-building. This will probably save time on trial-error, and thus enhance productivity. Basically, this concept requires the compiler to be able to recover from compile errors and simply continue the process from the next statement. Oh, architectural joy! In addition, the compiler now also supports warnings i.e. messages that don’t count as errors, but still hold information about problems in code. Warnings don’t cause the compilation to fail, but they too will be listed in Cool Developer’s user interface.

As a bonus, you’d call the CoolBasic Classic compiler from command line like this (subject to change):

cbccompiler mygame.cbc /out mygame.obj /def coolves.fw

You can optionally enable Finnish interface by adding /lang fi

Possible platform changes

We recently had a discussion at our regular DevTeam meeting about exploring other possibilities for a proper development platform (and for the record, that meeting was one of the longest so far). There is a number of reasons why we think we should migrate away from a certain programming language, and base our code upon a “more supported” platform. For example, when the Chipmunk physics library updated not so long ago, several problems emerged regarding cross-compiler generated code, and ultimately broke some compatibility. Now, one would expect that since it’s C it was standard enough to “just work” every time. In reality, however, our developers have had hard time with importing and invoking (consider SSE2 and dozens of compiler flags in the equation) the 3rd party libraries. We’ve got Chipmunk working in the current build, but there’s a high probability that it’ll break again in future coming library updates. All in all, fighting these kinds of problems is indeed quite frustrating (and uncalled for), especially when debugging these things is very difficult – if not impossible.

I think that procedural coding in general, an inadequate editor, a bit too simple tools, and a somewhat limited debugger will not serve our best interests in the long run. To me personally, it’s not too encouraging to open up our current development environment and start writing productive code anymore. I’m a professional C# developer, and this procedural coding (I once preferred) is becoming a mental burden for me. It has definitely affected my motivation in a negative way. I know I said a year ago that in order to create a fast and compact compiler, a procedural approach would be ideal. But I consider pleasant and powerful tools equally important because it enhances productivity of the programmer and maintainability of the project. I’m referring to Visual Studio (which in my opinion is the best IDE out there) as well as to Intellisense, refactoring, code analysis, unit testing, and other very powerful tools provided by it. After all, I think that if the compiler required a tad more memory and it took 2 seconds longer to build a game, it wouldn’t hurt too much (in a year the computers running the compiler are probably multiple times faster anyways).

So we’ve come to a point where we’re going to port our existing code over to C++ and C#. This will make a whole lot of new possibilities available to us: We can now harness the power of modern development environments, get more productive, and we no longer have to worry about the import files. We can trust that the generated ASM is correct and that using external libraries won’t conflict (that much, at least). There are pros and cons in object oriented design, but overall I’m confident that the marginal speed gain provided by procedural coding style isn’t going to outweigh anything. One of the most important things is that the developers are happy and get to work on something they enjoy.

The Cool VES game engine is going to be re-written in C++. This still enables us to inject ASM to where speed is critical. We’re also experimenting with a new multimedia library that ought to ease our job when implementing certain command sets in the future. We also continue developing Cool VES for both Windows and Linux.

I also mentioned C#. That’s for the CoolBasic Classic compiler! Now, this is quite interesting because there has been one project (that has nothing to do with us) I’m aware of, that attempted to write a compiler in C# about a year ago. It didn’t turn out that well, and currently the project in question is apparently frozen. One would think that this should be considered a warning example, but yet I’m willing to try doing the same thing because I really think it’s possible and quite doable. Yes, there are some serious challenges, and the architecture plays an exceptionally big role here. C# makes it possible to create a very high-level and sophisticated core for the compiler, and I’m very excited about it. Only a day after the meeting, I already had an initial architecture plan and a Visual Studio solution in place. Given that procedural model is seriously impacting my motivation, I actually believe I’ll get the job done faster now that I have good tools – even when it means I basically have to start over.

Many people associate C# to the Windows operating system. While it is true that C# is largely used to write software for the Microsoft .NET Framework, not everyone is aware that you can target programs written in C# to other platforms as well, like Linux. Just as Cool VES is intended to have both Windows and Linux versions, CoolBasic Classic compiler should also be available on Linux. I’ll be using Mono for that, but more information about that will be published at a later time.

If this becomes an epic fail, I can always revert back to the old compiler, although I highly doubt that. “I want to believe” πŸ™‚ . In fact, I should have known better when I made the decision of the development environment almost a year ago. I hope I get it right this time…

TLDR; We decided to start using more powerful development tools, and we’re all now more productive and happy.

Introducing the Cool Developer editor

Hello again. Guess what, we have another video to offer! This time we’ll demonstrate Cool Developer, our forthcoming code editor and development environment. If you didn’t know, Cool Developer is the very same editor you’ll end up writing CoolBasic Classic games with in the future, so I think the video material you’re about to see is going to be very interesting. This sneak-peek introduces the editor at a very early state, but it’s already working wonders and we thought it’s ready enough to be publically announced.

This video is narrated by Antti Kajanus, a DevTeam member who is also a professional programmer and an expert in various Microsoft technologies. He’s an ASP.NET MCPD, and currently actively working with Windows Presentation Foundation. He’s the main responsible for Cool Developer. You may want to check his blog for more information and interesting stories about software development at excitingcode.com.

The following video shows quite a few interesting things, including a very powerful layout/windowing system, a solution explorer, an editor control, and the ribbon. The icons and other graphics are by no means final. The intention is to show the basic concept of the new editor. Enjoy!

Copyright © All Rights Reserved · Green Hope Theme by Sivan & schiy · Proudly powered by WordPress