It’s been two and a half weeks since the last entry. It may seem a long time without any updates, but I’ve been working very hard on Pass#2 during this whole time. I finally managed to establish the architecture for data type resolving and name resolution. I just didn’t want to express any thoughts on this highly experimental phase until the solution that actually works on all test cases, has been found. That being said, I’ve spent the last few days on small improvements and bug fixing. Also, this is a good spot to clean up the code and fine-tune the work done so far until I get started with the next big thing (which I won’t elaborate at this point). One thing that needs improving and what has been haunting me for some time now, is the way the compiler expresses compile time errors to the user. The coding has been so hectic from the start that even though the compiler is able to express all the needed errors, some rephrasing is surely needed. Especially for Pass#1 which features a lot of syntax checks, and thus a lot of similarly formatted error messages.
Current V3 compiler error messages
For reference, compilation stops to the first error occurred – to prevent further (possibly misleading) errors from a chain-reaction the first error might have caused. This is identical to how current CoolBasic and many other Basic language compilers report errors. If you have programmed in C and the related languages, you might be familiar with the error flood you get when you try to build a project that had maybe just few small mistakes in the source code. In most cases, the tail of that list consisted of completely mysterious errors, and only because of the preceding errors caused flawed compiling process to proceed until the compiler got too confused to be able to continue. Luckily, compilers have become more intelligent regarding this matter, and for example, Visual Studio is able to construct a list with only the essential errors in it. CoolBasic V3, however, will continue to terminate the entire compilation on first error because of safety and stability. The sooner you get back to fixing it, the better.
Back to the current V3 implementation. To save time, errors are returned as strings when they occurr. I didn’t want to gather the absolute list of Pass#1 errors beforehand because it would have changed anyways. This was fully intended at that point in development. However, now that Pass#1 is essentially finished, I know all the errors that can occurr within it. And there’s now the need of rephrasing which means I would have to Find & Replace them all. Instead, it’s better to standardize the messages by binding them to constants; rephrasing them would require the change to be made to one place only. This need was also foreseen, but now it’s time to implement it. I would have changed the error message generation anyway, but this is also a great opportunity to think over better ways to express errors in terms of how they are phrased; Currently there exists a few vague messages that don’t provide enough information. For example, what does “Statement out of context” tell to the programmer?
The ideal error message
A good error message is uniformal, informative and precise. It should also provide a hint on how to correct the error. That way the programmer doesn’t have to visit the manual or spend time to understand what the error could mean in that particular case – which improves productivity and overall relish. However, being able to tell the error line, code module, the cause and the possible error code for further reference, is not enough. What if you had separarate statements written on the same line (separated with : of course), and you got an error saying “Error at line bla bla in code module bla bla: This statement cannot appear here.” More unnecessary time trying to figure out which statement caused the error. From my point of view it wouldn’t be too much of a work to bother adding this information to the error message, but omitting it would surely add up when hundreds of end users write programs in CoolBasic every day. Picture this: “Declaration context of ‘Const’ is invalid.” Sounds better? Sure, but something is still missing.
Normally I would add a hint on how to correct this error by specifying where constants are allowed or where they cannot appear. In this particular case it’s not that simple, because even though constants can appear within procedures and blocks, they can’t appear within Property or Select statement bodies. Rephrasing this error to cover exact definition where a constant can be declared would result in too complicated error message. Uniformality excludes me from listing the legal declaration contexts as well because their exact contents can vary depending on the statement. Compromises do happen.
Precise error messages also tell meta information about the error. For example, a vague error message like “Identifier is already defined.” doesn’t really tell which identifier is causing the conflict. A better version could be “‘myVariable’ is already declared in its context”, but it can be improved even further, providing more narrowing information: “Variable ‘myVariable’ is already declared in this Class.” – or even by providing the full declaration information, including the data type definition.
Whereas “precise” refers to “contextual detail”, informative errors specify moreover what’s syntactically or semantically wrong. For example, instead of telling “Syntax error. Invalid statement.” you could specify “End of statement expected.”
Detailed error messages tend to increase their amount. That generates variance which easily gets out of hand. Thus, even though you have practically different error messages for telling how different kinds of identifiers conflict in context, or what token statements expect next, they should follow the same basic formatting. This will not only result in impression of quality to the end user, but also makes the developer’s life easier in the future. Uniformality also emerges within the error codes. Even though variables and constants generate slightly different error messages when, say, their data type could not be resolved, they still share the same error code because the error occurred from the same cause. In other words: Error codes are not tied to the actual (unique) description. Something I need to keep in mind…
Showing the error to the user
The current CoolBasic version shows errors inside of a modal window with the code and description in it. Because of its modal nature, you must close the window in order to edit your code. And not only that, you can’t do anything while the windows is there so you *need* to close it. Does anyone else sometimes forget what the actual error message was, while you’re thinking and fixing? Well, I do, and it’s irritating when it happens because I will then probably compile it again to see it. Also, I don’t want to hear the Windows’ exclamation sound every time the compilation fails. In addition, there’s a known issue of Scintilla sometimes invalidating its rendering area when a modal window pops up basically making the source code invisible for quick inspection while the error window is present. So I’m experimenting with an idea to leave the latest error message visible to the bottom section of the editor, in a similar way than Visual Studio does. Also, when you click on that section, you could open the manual page for more details. You could even filter that section to show full error history or the latest error only. The error message should also be easy-to-read meaning that the error module and line number should appear in, say, inside their own respective columns.
The principle is that errors should be silent, but noticeable. And they should not create any unnecessary stir or require actions from the user.
Localize the compiler too?
I have mentioned several times that CoolBasic V3 will be fully localized into both english and finnish. This includes manual, website, forums, editor, tools, examples, tutorials and all of the content in general. There has been one hesitation to the rule, however. Normally compilers don’t get localized. It’s just the way it is, and nobody really questions it. Due to absence of real life examples and “significant” products pioneering this concept (To be honest, I don’t know about Visual Studio. I’m too lazy to check it out, but I haven’t heard about such thing), and combined with the amount of the required maintenance or the infrastructural problems within the compiler’s architecture, the idea hasn’t really catched on. Parts of the error messages would be in english anyway because common base class libraries are mostly coded in english, thus their identifier names mix to the localized language, making it look funny.
With all this rephrasing going on, it’s possible for me to build support for both english and finnish error messages. And I have decided to give it a try. You can always disable it, right? However, I will probably use english as the default compiler language even for localized CoolBasic unless separately changed from the editor’s settings.
The changes brought up in this blog entry will keep me busy for a few days…