2009 Wrap-up

So the world has now entered a new decade. Let’s summarize what happened in 2009, and also peek into the future a bit.

For me, the year 2009 was pretty solid all the way. I developed CoolBasic V3 compiler, and managed to finish it. Well, nothing is perfect the first time you do it, so I’m still going to have to rewrite parts of it in order to improve performance in certain situations. However, as I got this idea of complete rewrite of the current CoolBasic, V3 will have to wait. There are number of good reasons why it’s smart to finish CoolBasic Classic before continuing V3 development. Being such a gigantic project with a whole lot of things that can ruin it all, CoolBasic V3 will definately need a development team. The contrast between current CoolBasic and V3 would simply be too much for the current community to handle, making it very hard to establish a competent DevTeam. CoolBasic Classic is a step to that direction, and it gives me an opportunity to test the idea of a small (yet fully organized) development team as well as some of those technical solutions already implemented in the new V3 compiler. CoolBasic Classic should provide a smooth step to V3’s direction, but it’s still going to be mainly procedural language. With CoolBasic Classic I hope we can increase the size of user base, resulting us better chance to find willing and competent people to develop CoolBasic V3 – when it’s time.

The last quarter of 2009 went on me planning this CoolBasic Classic DevTeam thingy as well as writing the Classic compiler. I must say time has wings and has flung by. It’s almost as if it just was autumn. I spend several hours a day working on CoolBasic Classic and its DevTeam launch, but due to big plans I need to study a few areas of interest regarding a few technologies I’m going to utilize in this project. It’s too early to tell publicly what those would be, but if all goes as planned 2010 will be a very interesting year for CoolBasic community. We’re definitely going to launch this year, and CoolBasic Classic will also be available fully in English! As a side-note, the Finnish community has grown vivid after the announcement, and there’s a lot of excitement in the air. I’m very happy to see an increase in the forum’s activity.

So DevTeam application period has now closed which means I’ll start studying those applications and then assemble the team. I was wondering if 31 days was too long a period to accept web form applications, but they came in pretty evenly throughout entire December (with slight front-load, though). I received 29 applications in total which I think is quite good considering how strict and demanding the requirements were. All vacancies were fulfilled, including the superior layer. The organization chart shown in the previous blog entry, will be changing a bit, but its reviewed version will be made available not until the team has been finalized. All in all, I think we’re going to have a solid team consisting of different areas of competences – even at a professional level. That being said, there will be interviews, to be held soon enough, based on which I’ll make final decisions about the composition of the DevTeam. DevTeam website has already been set up, necessary documents have been written (and will be available to DevTeam members through internal document service website), initial assignments have been formalized, and the forum now has additional hidden sub forums the normal users cannot see. I’ve spent countless of hours preparing the DevTeam launch, and the material they’ll have to read in the beginning, may feel a bit overwhelming.

There are quite a few topics I’m going to discuss with the DevTeam, and as things get decided, I’ll be newsing about them in this blog. Even with 15 people plus me, it’s still going to take months until we can launch CoolBasic Classic. Yeah, it’s THAT big of a project.

CoolBasic Classic DevTeam Applications

Time is now!

As of 1st December we’re now accepting DevTeam applications (Finnish only) via a web form. Open spots, requirements and detailed instructions have been announced in the Finnish forum here. Everyone who thinks he/she’s qualified (including existing staff) may supply one primary and optionally one secondary application for the following titles (including 3 officer/manager ranked spots):

General Manager
Content Manager
Community Manager
Tech Developer
Web Developer
Developer
Graphics Artist
Sfx/Music Artist
Forum Administrator
Forum Moderator

Time is now!

That’s right, we have one superior layer. Also people that are currently not part of the CoolBasic community may apply: If you know someone skilled enough who might be interested in participating a fully organized development team for the future coming CoolBasic Classic programming language and game making environment, please inform them. Those accepted will be contacted later on January 2010 after which they’ll receive their personal DevTeam-account and thus gain access to internal web pages and confidential files. Candidates will also be interviewed. All applicants are bound to DevTeam contract and NDA.

Preparing for the DevTeam

Three weeks ago I announced CoolBasic Classic, and also told that it’s going to be out before CoolBasic V3. I also mentioned about a DevTeam that I’d assemble to help me in this project (it’s just so massive I kind of can’t do it alone in a reasonable time). My plans have now become clearer, and a major part of the preparing work I need to do before launching the DevTeam is already in a good shape. Now I won’t be officially announcing the process on how to apply to the DevTeam just yet. But next time I blog, I probably will. All in all, I urge anyone truly interested in participating, to monitor the forums closely for the next few upcoming weeks. This opportunity (being part of the DevTeam for CoolBasic Classic) only applies to Finnish people – at least for now.

So what have I been doing during these 3 weeks then? Mainly websites – for the DevTeam.

The DevTeam will have their own website much like any company Intranet, but in a smaller scale. It consists of (includes, but is not limited to) dashboard, document storage, ticket system, and administrative interfaces for web content (including the www-portal and online manuals). Those two mentioned first are mostly done. The rest will be developed by the DevTeam itself. This website is, of course, secured and restricted from public access – excluding the document storage system which will host both public and internal documents. Each member of the DevTeam will receive their own userID and password that they use to log in to the system. Members can also edit their settings like email and password. Security of this website is something I have paid great attention to: the authentication module can prevent SQL injections, session fixation, XSS, CSRF, form spoofing, path traversal, and brute force attacks – only to name a few. I’ve implemented even some advanced mechanics to prevent certain newly found attack techniques such as DNS rebinding and the protocol comment newline injection. Database credentials (and the documents/files of course) are also inaccessible from web browser, and they’re actually invisible to the web developer, too. It really has become my little experimental sandbox for a secured website. Ironically though, I haven’t yet managed to enable SSL on my web server (gosh, I’m a programmer, not a sysadmin).

The document storage website is an interesting service and it has taken the majority of my time. I consider it now “finished” (although I’ll probably write a visual administration tool for easy role assignments later). It supports full hierarchical category structure the documents belong under. I can assign which user roles are eligible to access any files found from the storage host. Roles can also be assigned to the users. These two combined it’s very flexible to control which users can access what. I can also suspend accounts or “retire” them. All member-made file requests (together with general authentication module logins/logouts) are also logged.

In addition to the DevTeam website, there now exists database schemas for the forth-coming web portal and the CoolBasic Classic online manual. I will probably delegate at least a portion of the web portal development to someone in the DevTeam once I get it assembled. And for that, I need to write some serious specifications. One thing is for sure… there will be 1-2 open spots for skilled and able web developer(s) within the DevTeam. Yeah, there’s that much work.

I’ve also sketched the organisation chart which basically illustrates the members of the DevTeam and their dependencies. In other words, I already know the open spots and what kind of people I’m going to need. Their skill requirements and “job descriptions” have also been planned and written down. There will be applications. While this information will (probably) be published in much greater detail next time I blog, I suppose I can safely say I’m looking for lots of different kinds of people with various expertice and skills: programmers, designers, specialists, web developers, content producers, and artists. Even managers. Also, the DevTeam will probably be extended with new members at some point in 2010.

So what am I waiting for… let’s do this!

Sorry, but I still have lots of work to do before I can start recruiting (I will do my very best to get things rolling before Christmas). Now that the web thingies are in such good condition, it’s mainly the CoolBasic Classic compiler I need to finish. And then there’s some serious writing work to be done (but then again, I can probably do most of it during recruitment time). Without a working compiler nothing else really advances, and that’s why you’re basically now waiting… But! Worry now, I have a good feeling about this 🙂

CoolBasic Classic

First of all, this is purely a strategic decision. The idea has been around since Assembly Summer ’09 (held about 3 months ago), although strictly kept in secret. This news should delight the current CoolBasic user base. And here it comes: “We are announcing CoolBasic Classic, a procedural game programming language, and it will replace the current version.” It will also be released sooner than CoolBasic V3!

There have been rumors and speculation about further development of the beloved procedural and easy game programming language, CoolBasic. Your concerns about too steep learning curve between procedural and object oriented programming have been taken into account, and CoolBasic Classic is designed to continue from where the current (outdated) Beta left. Yes, CoolBasic Classic and CoolBasic V3 are two completely different products, although both are free BASIC-like programming languages designed for game creation. The programming language of choice is partially linked to taste, and forcing everyone to move on to object oriented world seemed a bit too harsh – especially given that the user base is relatively young (c’mon, it’s a game making tool). We want to offer options from which the users can choose the best tool fitting to their interests. Also, we felt that it really is time for some serious technology upgrade – I’ve seen how bad it is. And it could be so much better. There are lots of other reasons behind the decision as well, but I’m sure you’ll find them out by yourselves eventually.

The current CoolBasic will undergo a complete overhaul. There will be a new Development Environment, new Compiler, new Virtual Machine, and new User Manual. There will also be changes to the website (it will reborn as real portal) and the forums. All content will be available in both English and Finnish. The amount of work is too much for me alone, so I will also assemble a DevTeam – with real responsibilities and assignments. I’ll talk about each of the mentioned aspects in greater detail in a moment.

The language
In a nutshell, CoolBasic Classic syntax will remain mostly the same as in the current version. Some features will become obsolete, whereas others will be fine-tuned and improved. All in all, users should be able to port their existing code without too much of a change (outside of find-replace, that is). However, due to complete re-write, command sets can change (and probably will) a bit. But this is necessary in order to implement some of the planned new features in a rational and consistent way. Syntactic changes as well as composition of command sets are something I’d like to hear other opinions about, and that’s one thing I need the DevTeam for.

The in-built data types mainly remain the same, only float numbers are double-precision now. The design also pays extra attention to 64-bit integers for possible later implementation. The language is also slightly more type-safe. You can now define arrays and functions of any type, or pass arguments of any type to a function, including typed arrays. In addition, you can overload functions now. New statement types may also be introduced later (enumerations, anyone?).

The compiler
This is already a work in progress, and it’s, in fact, half done! The CoolBasic Classic compiler is also borrowing technology from CoolBasic V3, and I’m really excited about it because this is a perfect chance to test it in practice as part of a slightly less complicated process. Since CoolBasic Classic is a procedural language, the compilation is much simpler. I can skip some phases that CoolBasic V3 compiler performs, and this will result in very fast compilation and also somewhat smaller memory fingerprint. The new compiler is hundreds of times faster than the current CoolBasic compiler – the process would be done in a few milliseconds even if the source code was tens of thousands lines long.

The compiler is just plain better in every aspect now. It has much better parsing mechanics (no more special-case syntax bugs). For example, you can now define multiple variables and constants within a single Dim/Global/Const statement, as well as initialize them. Moreover, all limitations are gone now – Yeah, the legendary function limit is no more (you can drop the hacking now, *cough*). Just like the V3 compiler, this one also fully supports Unicode, and it has been built 64-bit architecture in mind. Just like the V3 compiler, CoolBasic Classic compiler is a console application callable easily from 3rd party applications.

Virtual Execution System
CoolBasic Classic is still interpreted, although it has much better raw performance. I’ve named the new virtual machine as “Cool Virtual Execution System”. CoolVES is a game engine framework offering functionality to display accelerated graphics, play sound, use sockets for networking, and access advanced game mechanics (in-built physics library, maybe). The graphics engine is finally fully compatible with Windows Vista and Windows 7, and will cause no dump of AeroGlass upon launch. It uses DirectX 9 technology, although it’s possible to enable OpenGL, too. The sound engine is no longer bound to FMOD by default. No more questionable licenses. Executable sizes will be smaller, and it should now be possible to change the icon without messing with UPX in the middle.

There will also be a publicly available documentation about the byte code’s structure, basically enabling anyone to write their own compiler, and this code can then be then run in the VES. The idea behind is to leverage limitation of “the language of choice”. Maybe we’ll see CoolC#, CoolJavaScript, CoolPython (I’d like to name this “CoolBoa” hehe), CoolPHP, or something similar in the future. In the end, it doesn’t matter which language you used, the game will run just fine in the VES utilizing all features made available by it. We actually encourage community members to implement new languages to the CoolVES technology 🙂

The manual
Let’s bring this up to date as well. I already have the design ready, and the database schema is in the works. Details belong to the DevTeam, but the structure of the Classic manual will probably differ from CoolBasic V3 User Manual. The focus is the online content.

The editor
The aim is to have a shared Development Environment for all CoolVES languages (and probably for CoolBasic V3 as well): You just choose from a list what kind of a project to create, and in which of the installed language you want to work on. All game files naturally belong to a project, but CoolBasic Classic still utilizes the IncludeFile statement, don’t worry. Project based approach, however, enables nice new ideas to control and load game media and other content. But that’s, once again, the DevTeam’s business.

Everything is based on modules, and they can be updated automatically. Everything is always up-to-date, and only the changed files will be downloaded. The purpose is to deliver bug fixes and updates as soon as they become available with minimum trouble. Modules can also be tools. Imagine game wizards that generate source code for you – or better yet, build a complete game for you without a single line of code!

Also the editor component will be updated: I’m planning for better syntax highlighting, code refactoring and full Intellisense – only to name a few. Writing code in a modern and efficient way is the top priority.

The DevTeam
I need an orchestra! I can’t play this out alone. CoolBasic organization will be founded before the end of the year, with full hierarchy and assignments. CoolBasic staff will born, and we’ll do development together, mmkay? And for all this, I need volunteers willing to carry responsibility. I need developers (this includes tech and web too). Confidential material, such as source code and technical documentation, will be shared among them. The required skills vary a lot, but I expect good knowledge in whatever that is they were chosen to the team for. I also expect that they have the time and true interest in CoolBasic technologies and are willing to improve CoolBasic products. Staff members don’t necessarily have to be master programmers; I need designers, writers, managers, and content providers, too. If you want to be part of the Finnish game making culture, and you want to aim high, then you fit the profile.

More information about the DevTeam’s tasks and how to apply will be posted on the official forums later.

Document Storage
For the past few weeks I’ve been building a website specialized in document storage. It has a login system and access permissions based on user privileges. While visitors have access to public documents, members of the staff may log in and gain access to certain internal documents, source codes, database dumps, tools and generally everything that is meant for the use of the DevTeam. Also this project has past the halfway now, and it’s looking really good in general.

TL;DR: CoolBasic Classic will be remade, in every way. And it will be released prior to V3. More information about the DevTeam will appear on the official forums.

The Finnish translation of this info is also available here.

The new parser is underway!

It’s been a bit silent ever since I finished the V3 compiler – oh well, kind of. Even thought the compiler was fully functional, I realized that it had to be re-written in order to gain more performance. Due to drastic syntax changes of the key elements I can’t copy-paste the previous code much at all. And, writing basically the same old code again is – big surprise – boring. I also finished my thesis so I felt complete, and wanted to take a few weeks off. I’ve been working hard one year straight after all.

So things have progressed a little slowly for the past few weeks, but at least they have. I didn’t want to blog until I had finished a certain feature, and that’s the support for conditional compiling. I demoed the feature in Assembly Summer 2009, but it has now been enhanced! Conditional compilation practically means that the programmer can affect at compile time which parts of the source code will or will not be compiled. Because CoolBasic V3 compiler optimizes all conditional branches, including If, While, and Until, by stripping constant expressions that evaluate to False, from the final Intermediate Language, the #If and If statements were essentially identical. However, conditional compilation now gets evaluated immediately after lexical analysis so that False branches don’t get processed in Pass#1 or Pass#2 at all!

These so-called directive expressions must evaluate to a constant value. Because the directive expressions will be evaluated before statement parsing, regular constant variables can’t be resolved. Thus, there will be no name resolution (as we understand it in OOP), but instead, CoolBasic introduces directive constants which can be declared via the #Const statement. These constants are separate from the regular ones you’d declare via the standard Const statement, and they will not conflict by name. Directive constants are always Private to the file they are declared in. In addition, it’s possible to define global directive constants at project level, but that will be a feature of the code editor.

In contrast to regular constants, directive constants can be declared multiple times with the same name, and it’s possible to assign a different value to them each time. You can use existing directive constants in directive expressions when declaring other directive constants. Imagine the possibilities for implementing different sound engines, for example. The following example will shed some light about the concept:

#Const DEBUG = True
#If DEBUG
    // This will get compiled, and is available in program
    Function A() 
    EndFunction
#Else
    // This is unavailable in program – unless #Const DEBUG = False
    Function B() 
    EndFunction
#EndIf
#Const DEBUG = Not DEBUG // Redefine the DEBUG directive constant

As directive expressions are processed so early, I had to write yet another postfix converter and calculator for the purpose. God that was boring. At least now it’s done. Next, I should be able to start writing statement specific parsers and general-purpose parsers.

I may (or may not) have a mega surprise next time. Stay tuned.

Introducing the new Lexer

Last time I showed you some test results of the performance of CoolBasic V3 compiler. The numbers weren’t too appealing when the compiler was fed a huge special kind of a test file. The structure of that certain file caused some compiler bottlenecks to amplify, which resulted in slower compile time than expected. All this got me into thinking of more optimized ways to apply lexical analysis, parsing, analyzing and code transformation. In my previous blogging I introduced some visions of mine in how to fix those issues, and the modifications to the lexer are now in place.

The lexer is executed in the beginning of the compilation process. It’s responsible for identifying tokens from the file stream and then saving the tokens to a linked list for later processing. This new implementation also uses the new fast FileReader, and it features full widechar conversion and Unicode support. The lexer also now utilizes the new GlobalMemoryCache system and super-fast linked list system. In addition, the compiler now runs in both 32-bit and 64-bit mode. So, lots of new technologies have already been implemented, and the base to build the parser and analyzer upon, is eventually materializing. It wasn’t easy to come through, though.

My first idea was to use hand-written Assembly code in order to jump between different lexer states without actually needing to test any conditions for it. Basically the FileReader just called asm jmp to certain label address that was dynamically stored in a variable. It was faster than a procedure call, and certainly faster than a procedure call + set of conditions to test the lexer state against, but I found out the hard way that this wasn’t the most secure way of controlling the program flow: I experienced some odd side-effects I couldn’t explain what caused them, so I eventually threw the Assembly code out of the window, and did it in the “right” ™ way. In addition, in 64-bit mode you’d have to use the qword modifier in order to perform the asm jmp anyway, which for some reason, is slower than in 32-bit mode – in fact it would’ve been a bit slower than a standard procedure call.

During my “happy” Assembly experimentation I experienced very slow compilation (probably lost some hair and/or beard because of it). First I thought it was because of my bad asm-cookings, but the problem persisted even after I removed all hand-written Assembly from the source. After lots of digging and peeking, I finally found the cause: It was just one little procedure that gets called only once per line, but which took 0.8 milliseconds to complete every time. For big test files (11,000 lines+) this adds up. It took me countless of hours and lots of cursing, but when this little issue had been identified and properly handled, the lexer had finally reborn.

The lexer is now indeed very quick – I’d say I achieved my goal. The difference compared to the previous V3 lexer (which also was quite fast for “normal” code i.e. a code somebody would actually write and not artificially created and absurdly large test files) is notable: The new lexer outperforms the previous V3 lexer by up to 6X! The performance was measured with 5 artificially generated, and relatively large, test files as follows:

File size: 4.5Mb
Physical code lines: 11,000
Average line length: 1,100 characters
Consists of keywords, identfiers (6-10 charactes each), integral and decimal numbers (5-7 characters each), parentheses and wide range of different operators
1.6 million tokens
File read + full lexical analysis: 0.36 seconds on average

File size: 12.2Mb
Physical code lines: 11,000
Line length: 1,137 characters
Consists of identfiers (7 charactes each)
1.6 million tokens
File read + full lexical analysis: 1.54 seconds on average

File size: 12.5Mb
Physical code lines: 11,000
Line length: 1,163 characters
Consists of integer numbers alone (6 digits each)
1.8 million tokens
File read + full lexical analysis: 0.31 seconds on average

File size: 14.3Mb
Physical code lines: 11,000
Line length: 1,329 characters
Consists of double decimal numbers alone (6 digits + decimal separator each)
1.8 million tokens
File read + full lexical analysis: 0.39 seconds on average

File size: 12.1Mb
Physical code lines: 11,000
Average line length: 1,150 characters
Consists of random language keywords (83 different kinds of)
1.4 million tokens
File read + full lexical analysis: 0.87 seconds on average

Even though most processing is done in memory, the lexer doesn’t read several megabytes long files completely into memory. This keeps memory pressure in manageable boundaries.

And in the end…
It’s now possible to use the underscore as a line continuation character, thus the following code is considered to be one logical line:

Function MyLongFunc( _
    parameter1 As Integer, _
    parameter2 As Integer, _
    parameter3 As Integer, _
    parameter4 As Integer, _
    parameter5 As Integer _
)

Improved Compiler Performance

Hello again! After 3 weeks of hard writing, my thesis is nearly finished. It only lacks a few edit rounds, so I can now eventually return back to work with CoolBasic V3 compiler. I’m sorry to say this, but I still have lots of work to do until I can start thinking of taking any actions to make the compiler public to anyone – including the DevTeam. As always in software engineering, surprising things tend to pop up, and almost every time they result in extended development time and higher cost. Fortunately I’m not under the pressure of any deadlines nor do I have to pay anything, but this is going to take considerably more time as originally intended. Let me explain why.

Before my break, I conducted some pressure tests on the compiler. I artificially generated a huge source code consisting of nearly 10,000 symbols and close to 2 million tokens. The average line length was around 1,000 characters. Those lines contained overly complicated expressions no-one would ever write anyway, but this kind of a code should be a good benchmark to test the true performance of the compiler. Now, even though CoolBasic V3 compiler is quite fast, this particular structure of the test program caused some of the compiler’s characteristics to exaggerate, causing cumulative increase in compile time.

The huge test program compiled in 12 seconds. Needless to say I was quite surprised. I knew this code could take a few moments to compile, but to be honest, I didn’t expect to see it taking that long. I found this intriguing, and I quickly ported the exact same code over to VB.NET, and ran the VB.NET compiler manually from command line. The process completed in 2 seconds, meaning CoolBasic was 6 times slower than VB.NET compiler. I’m too much of a perfectionist to let this be, because I know I can do better than that.

At this point I’d like to emphasize that “normal code” i.e. natural mix of “normal length” code lines, empty lines and indentations as well as wider diversity of different kinds of statements would probably bring CoolBasic compiler closer to VB.NET compiler. However, if I come to know that there’s a potential problem, I will fix it – even though nobody would ever write this kind of an absurd code. That’s indication of quality. The fact alone that something awful is possible, troubles me. Now, someone could think that this is all big waste of time, but the truth is that this is one of those things you’d better fix as early as possible because *if* somebody someday founds it, then you’d be screwed.

So I started to map which parts of the compilation process took longest to complete. The results amazed me: Those 12 seconds broke down to three sections: lexing (3.0 seconds), parsing (0.9 seconds), and code analyzing & transformation (8.5 seconds). I was mostly surprised by the lexer that munched one quarter of the entire process. After further investigation it occurred to me that the root of all evil was the file reading process (now what’d ya know!) I never saw that one coming. The compiler reads one line at a time from the source file, and then passes it to the lexer. OK, I was aware that this might not be the fastest way possible, but I didn’t think it’d make much difference.

The lexer needs to scan every character individually, utilizing the Mid-function of PureBasic. Now it appears to be that in case of exceptionally long strings (like 1000+ characters we have here) the Mid-function is underperforming. I wrote my own in assembly, and managed to squeeze some extra speed, but the lexing process still took, in my opinion, too long. Eventually I found out that the other string functions of PureBasic, were too slow for the job as well. So I started brainstorming alternative ways of implementing the file reader and lexer.

Another bitter surprise was when I found out that PureBasic’s in-built linked list support (something that CoolBasic V3 compiler makes great use of) is not fully optimized for high performance – at least not for my needs. During those 2 million tokens, there will be hundreds of millions of requests, iteration, deletions, and additions to those lists which ultimately add up resulting in slowish compilation for very lengthy expressions (such as in the test program).

Sure thing there are some things I could have done a bit differently that would’ve helped to some extent, but in the end not much. I’m a little upset of all this because for the most part it’s not my fault. I learned an important lesson though: you want it right, you do it yourself! While I was writing my thesis, I hatched a few great ideas I believe will make a great difference. Perhaps those ideas will bring CoolBasic V3 compiler very close to VB.NET compiler (that has now become the goal). Below I have listed some of the actions I’m taking in order to substantially improve the performance of the CoolBasic V3 compiler (and I think they’re well worth the extra time it takes to implement them):

Full x64 support
To be honest, better to implement this now. Or otherwise I would have to do it later anyways, which would probably be too difficult a task and most likely cause the entire compiler to be re-written from the scratch. This in turn would probably yield a good amount of new bugs rendering end-users not so happy. 64-bit architecture is quite exciting a feature although the runtime engine won’t probably support it immediately. As opposite from what I started the CoolBasic project with, I now have a 64-bit Windows Vista at my disposal, making it possible to develop and run 64-bit code.

Full Unicode Support
This is another thing that needs to be taken into account at a very early stage in development. Not only Unicode support enables the compiler to read ANSI, UTF8, and UTF16 (both Little Endian and Big Endian) encoded files, this will also ensure that all pre-compiled modules are compatible with each other. This also means that strings in CoolBasic V3 take up 2 bytes for each character. Also, it’s now finally possible to use virtually any localized character in source file identifiers, comments and so on.

Hi-performance FileReader
The huge test file was 4.5 megabytes big. On my laptop, reading every line, and then individually scanning each character, took 3.4 seconds in total. There was no further processing to read characters, which makes the elapsed time so alarming. However, the new file reader processed the exact same task in 0.04 seconds, resulting in 85x better performance – and that’s with additional widechar conversion. Also, this time there will be no string operations during the entire V3 compilation process. Even string comparison during name resolution is done via hi-performance smart selection algorithm and memory API.

Hi-performance Lexer
With the change to FileReader, also the lexer needs to be re-written. It will now be based on state information rather than an algorithm similar to regular expressions that was used before. I’m injecting manual assembly in order to jump directly to different token handlers. Obviously, this is breaking a number of “best practices”, but the speed gain is notable enough to justify. The lexer also features a dynamically reallocatable text buffer which makes identifier, keyword, and string building very fast. Some additional testing still needs to be done, but overall I expect a prominent speed gain over the previous lexer.

Hi-performance TokenList
While I was brainstorming, I suddenly got this awesome idea of implementing a linked list of my own as a replacement over PureBasic’s slowish implementation. Token list is one of the most used entities during the compilation, and its speed is vital in order to perform well with large projects (such as complex games). I was eager to test my theory, so I wrote a bunch of linked list related functions, basically forming entire base of a linked list library. Its features are much more diverse and precise when compared to linked lists in PureBasic, offering functions such as AddAfter, AddBefore, AddFirst, AddLast, and RemoveRange.

The test program had a linked list of 2,000,000 token elements, so the estimated memory usage was 46 megabytes for 32-bit version. During the test, 700,000,000 element additions and deletions were made in total, targeting various locations within the list – some at the beginning, some in the middle, and some at the end. Coolbasic V3 list implementation completed the task in 17.0 seconds on average, whereas PureBasic’s equivalent test program completed in 34.3 seconds on average. This means that CoolBasic’s version is more than 2X faster. In addition, memory usage at the peak for CoolBasic V3 compiler was 52 megabytes and 72 megabytes for PureBasic. I was happy.

If we test simply how fast the system walks through the list, and not making any allocations or deallocations there, PureBasic catches up: CoolBasic compiler is now “only” 1.6X faster: Fully iterating 2,000,000 tokens long linked list 1,000 times took 17.1 seconds on average for CoolBasic, and 27.2 seconds on average for PureBasic. The algorithm behind this hi-performance linked list will remain my little secret.

Hi-performance SymbolTable
The second most used entity during the compilation process is with no doubt the symbol table; there can be millions of queries to it. I admit I could’ve designed its iteration more powerful in the first place, but the symbol table is also linked list based, and thus doomed to underperform a task that I know can be made a lot faster. Therefore, using the new linked list technology described earlier, I can now not only gain speed for regular manipulation of the list, but also do some things that were simply not possible before. For example, I no longer have to change the list pointer every time I wish to add an element at the end of the list.

I’m also going to re-design the structure of the symbol table. Although consisting of a single linked list, each node can now store detailed information about its children as well as pre-determined pointers to next and previous node of the same parent node. This technique ensures that only those elements that need to be examined, will walk through iteration, which will obviously result in swift table lookups – combined to new hi-performance resolvers. The new structure isn’t fully taken its final shape yet, but eliminating big part of search time every time an identifier is confronted ought to have a great impact on performance in general.

Hi-performance GlobalMemoryCache
Manual memory allocation is a crucial part of the V3 compiler. This process can be improved by packing the data tight in larger memory “pages”. Actual allocations occur more seldom, and the stored information is easier to transfer and cross-link between modules. Also, when a compile job finishes, it’s easier for me to deallocate everything with a single call rather than iterating something through, and then deallocating pieces here and there when necessary.

Hi-performance Resolvers
Currently, CoolBasic V3 name resolution differs from VB.NET/C# name resolution a little bit. It was a conscious decision to allow the resolving process to continue even though a non-accessible branch was found because there could be more accessible option presented within the upper levels. Due to this design, the resolvers were implemented as recursive functions. They needed to pass information between each calls in a linked list, which has proved to be inefficient in PureBasic. However, for the sake of performance, this no longer seems like a good solution, given that VB.NET and C# operate differently in this case. So, I’m probably going to change this technique, which should relieve some burden from this core compiler functionality. While doing so, I hope I can combine the type resolver and general identifier resolver into one, more manageable entity. By this change I will also gain a number of other straits as well, but I don’t wish to turn this blog post into any more technical babble than it already is.

And in the end…
Needless to say that these are all pretty big changes, and yes, the compiler needs a major re-write to implement all those things. The current V3 compiler is by no means bad, and I’m very proud of it. It’s just that if things could be done better, it’s too tempting to let them be as they are – especially since CoolBasic V3 isn’t out yet. Utilizing the knowledge I’ve gained during this journey, the new core shouldn’t take anywhere as long to complete as the original V3 compiler did. I, too, wish to get V3 out some day, you know 🙂 I think these changes are necessary, and are well worth the extra time.

Moral of the story? Low level programming for performance is good, high level programming for performance is bad. I’m really sorry about the delay since I couldn’t have known about these weaknesses of PureBasic until it was too late. You want it right, you do it yourself!

The compiler FrontEnd is now finished!

Hello again! As I promised last time, I have two big announcements to make. Even though summer has been warm and beautiful, I’ve been working very hard on CoolBasic compiler for the past few weeks. We’re talking about something like 8 hours aday on average, but there were also a few of those days during which I coded 11 hours pretty much non-stop. And it’s been great! Even though most of the work has been nothing but bug fixes, my motivation has all but diminished. Some people might think otherwise, but I actually get encouraged by intriguing bugs that take 24 hours to track down because it’s very rewarding to ultimately fix those.

I don’t normally mention about bugs I’ve found and fixed, but be it exception this time: I had a medium-sized architectural issue regarding method invocation and how the reference is passed to them. Static methods and structure methods add an extra twist. The original implementation had fixed stack position for the reference – which ultimately proved to not work out for complex dot notations that involve static members in the middle. I fixed the problem by re-writing the part that is responsible for recognizing valid references and injecting them into the final code. References are now always at the same relative location they appear in the original statement. Even static methods now have them although those don’t necessarily point to anything. Long and complex dot notation paths in general have been giving me quite a lot of work recently.

The first announcement
I’m proud to announce that I have now finished the Intermediate Language (later referred to as “IL”) entirely. This effectively concludes Pass#2 and thus the compiler FrontEnd is now considered “finished”. It’s been of a long road, and reaching this major milestone is a great relief to me. What this means in terms of progress? The IL still needs yet another transformation, but this time it’ll be final. So basically there’s just one process to make for the compiler, and then I can start working on the runtime. Also, it’s July already, and I think the devTeam won’t be assembled this summer but in autumn perhaps. There just is so much work to do, sorry. I’m about a month late of the original schedule I had. But I’d say it’s normal because in program of this nature surprises tend to emerge. Some of which force medium-to-large re-writes of existing code. Modifications always affect somewhere else and thus extensive testing needs to be done after architectural changes – be it minor or major. Thankfully, I didn’t have to implement major changes to anything (atleast not just yet).

I spent 5 days (there were quite many bug discoveries and fixes involved) to write an extensive array of Unit Tests. A Unit Test is basically a standalone fragment of code that responds to input and output; they are used to test program functionality isolated from the big picture, and each test runs in its own sandbox. These small programs I wrote, test all CoolBasic programming language features. After compilation I examined the emitted IL in detail and ensured that everything went well and that the IL was correct. As a result of this process, I gathered interesting data and statistics I’d like to share with you. I think the numbers speak for themselves:

Test Program:

– Code size: 2547 lines in 7 files, 62 kb in total
– Composition: natural mix of comments, declarations and executing code. All language features
– Symbol table size: 1150 identifiers
– Emitted IL: 7410 lines

Testing Platform:

– HP HDX18-1000EO (Laptop)
– T9400 processor
– 4GB memory
– Windows Vista (64-bit)

Results:

– Number of compilations: 20
– Average compile time: 105 milliseconds
– Fastest compile time: 97 milliseconds

That’s pretty fast compilation considering the complexity of the process. Overall I’m happy with the outcome and that “pressure test”. I’ll be using the same gigantic Unit Test program to validate everything when any new feature gets added later on.

The second announcement
Now that the pressure is mostly gone, there’s something completely different I need to do: I still haven’t finished my thesis, so I’m technically still a student. That’s something I’ll soon need to take care of as the official time limit of graduation is closing to its end (and I don’t wish to extend it). As some of you may know, I already work as professional software engineer, so that’s taken some time and focus off the studies. The only thing remaining (and the only reason I’m still in their register as “pending graduation”) is the unfinished thesis. So I’ll take a few weeks off CoolBasic, and finish up my studies in Bachelor of Business Administration in Information Technology. Graduation and receival of the official “shiny paper” will also be a great relief, and after that I can fully focus on CoolBasic once again. I will start the thesis from a blank table very soon. During this time, however, I will be watching the forums, and am available via IRC (nothing new there). My four-week summer holiday is starting on 20th July so I have lots of resources to use on the thesis (although I don’t wish to spend all the summer indoors). I will also be participating in Assembly Summer 2009 this year!

Some random thoughts
I reviewed CoolBasic V3 manual. After careful consideration, I came to conclusion that the black theme is unfitting to such material that is to become the primary textual source of information for Coolbasic V3. Although silver text is still very readable on black background, you can actually read dark text on white background longer without the eyes growing tired. I’ve taken a little more MSDN-like approach although I plan to enhance the visual look and feel.

It’d be cool to allow the online manual to be somewhat interactive and more dynamic. Features such as User Comments, Version History, Multilinguality, and linking to outside sources as well as flexible Maintainability will basically require a database. Thus, online version of the manual is best to separate from offline version. I’d still like to use XSLT for the offline manual, but then the end users couldn’t use Firefox to view it – or otherwise I’d have to make the manual very dull looking basically ignoring text styling within paragraphs. So there’s still some open questions to the offline version, but database approach for the online manual certainly looks quite promising. Either way, it’s always possible to generate the offline manual directly from the database. Database design, however, will be a challenge!

Enjoy the summer, folks!

CoolBasic Intermediate Language

Perhaps the title of the previous blog entry could’ve been better thought out. For some reason, I keep reading it wrong “IL logic is gone”, and I’m not sure if I’m the only one 🙂 . In addition, “program logic” is a term that’s basically the less technical alternative to “algorithm”. In the previous post, I was referring to how the execution bounces between “labels” generated by program flow control structures such as conditions and loops. Anyways, for today’s post I also have news about Intermediate Language, only this time I actually mean how separate lines are executed expression-wise (which also can be referred to as “logic”).

I have now almost finished those necessary functions that are responsible for generating the high-level instructions from each line of the original source code. This is essentially the base for “Byte code” that is used by runtime interpreters. The order of tokens, in CoolBasic Intermediate Language, is fixed so that they can be processed and calculated very efficiently at runtime. You can learn more about postfix notation in this wiki article.

The following CoolBasic code:

Module M
	Function R()
		Dim i As Integer
		i *= i + (6 / i) - i * 2
	EndFunction
EndModule

… will compile into:

.method public shared m.r ()
{
.size 4
IL_000002: ldloc     m.r.i
IL_000003: ldloc     m.r.i
IL_000004: ldc.i4    6
IL_000005: ldloc     m.r.i
IL_000006: div       
IL_000007: add       
IL_000008: ldloc     m.r.i
IL_000009: ldc.i4    1
IL_000010: shl       
IL_000011: sub       
IL_000012: mul       
IL_000013: stloc     m.r.i
IL_000014: ldc.i4    0
           IL_prog_6_end:
IL_000016: ret       
}

Operator data type pairs don’t show there, but those are to be determined later. Please also notice that this is not a complete program. I merely wanted to give you a demonstration that CoolBasic Intermediate Language generation is quite mature already. You can also see some code optimization in this example. However, there’s much more into it than simply performing binary shifts instead of multiplication or division in certain situations. The CoolBasic compiler is able to generate the equivalent IL from almost any language feature now.

IL generation from expressions also means that most if not all of the error messages (and so error checks) are done, and that every statement has reached their final design. During this time I have fixed quite a lot of bugs, none of which has proved to be overwhelming enough to compel me to come up with a work-around. There’s also been some bigger challenges such as property behavior with the direct assign operators (+=, -= etc), field access with long dot notation paths, constructor calls, and IL generation for special operators (to be revealed later). Minor and small problems usually take only minutes to solve, but bigger ones can take up several hours until I come up with a solution of some sort. Looking back, those “difficult” problems have solved quite nicely, and now that I think of all the benefits I’ve gained thanks to them, there’s no gum code into it after all.

In addition to technical coding, I’ve also established the new www-portal framework I’m experimenting with. CoolBasic web page will feature lots of dynamic content in the future, maybe even some kind of a publishing interface for admins. I’m also thinking of ways to bind this dynamic content to the development environment. As a small side-note I’ve also spent some time on new icon candidates for CoolBasic files, and at least the Vista versions look quite nice already.

The next blog entry will in all probability be just an announcement of two things: for one, of me having finished IL generation entirely. About the second I don’t want to talk about yet.

IL logic is done

It’s about three weeks since my last blogging. Sorry for that, I know some of you pay big interest to my doings and how CoolBasic develops. That said, there’s a good reason why I’ve been so silent: I have been implementing some exciting new features and I didn’t want to write a wrap-up until all of them were fully done and things are actually working. As Pass#2 is closing to its end, every statement needs to be fully operational so that they can be transformed into IL. This also means that I had to define the missing constructors and other required methods, and implement special behaviour to where it’s needed. For example, strings and arrays tend to require some exceptional processing, mind you all of which is invisible to the user. Both arrays and strings work like a class internally, but syntax in BASIC-languages dictate simplier and cleaner standards to express them (as what comes to creating, initializing and using them). In addition, strings are fully managed, arrays are not. But more about those topics in future coming blog entries.

CoolBasic compiler now being at a stage where the IL is being generated basically means that every statement has now undergone their final parses. Remember when I talked about parsing and analyzing back in Pass#1? Well, this “parsing” continues also after the resolvers have been executed. Pass#1 merely certified that those lines were syntactically mostly correct and eliminated lots of extra work for Pass#2. Now that all possible data we need has been gathered and analyzed by earlier sub-processes of Pass#2 (and the rest of all syntax checks have been performed), all statements get their vital parts being isolated and ultimately converted into Intermediate Language. I just finished the last remaining IL logic parser which means that it’s time to focus on the actual IL. By “IL logic” I mean the way program flow control stuctures such as If…EndIf, For…Next, and Repeat…Until form their final “jump there – do that” labels and expressions. There’s still work to be done in order to finish IL creation.

Basically, expressions still need to be broken into simple instructions like stack push/pop and mathematical operator calls. And that’s the next task to do – finishing up the IL generation. This also includes the creation of class initializator and finalizator methods as well as generating the global and local variable spaces for classes. I hope that the compiler frontend is going to be finished at early summer. When that is done, I can start focusing on higher level CoolBasic language services such as linked lists of various sorts. My plan is to provide “somewhat complete” version for the closed alpha (to be announced at a later time).

So… lots of cool things have been added to the most recent development build of the compiler, lots of fixes and improvements have been made, and also lots of testing have been conducted as a side product of all these. Overall, everyting is looking bright at the moment, and no major problems have occurred so far. I’d like to conclude this blog post to the following code snippet:

Dim k As Integer = 5

If 1 + 2

	Dim i As Integer // This is a local variable that is only visible in *this* scope
	Dim k As Integer // Error! Local variable hides another local variable

EndIf

If k

	Dim i As Integer // This is completely different variable "i"

EndIf

i = 10 // Error! Variable "i" is not declared in this scope

Copyright © All Rights Reserved · Green Hope Theme by Sivan & schiy · Proudly powered by WordPress