Implementing Strings

Every time I have started refining CoolBasic there’s a number of issues that keep haunting me about how to implement them in the “proper” way. One of these topics is handling strings. Strings are not values because they have dynamic lenght and memory consumption and thus can’t be stored in the stack where all integers and floats reside. Strings work as references instead, meaning that the string variable is really an integer pointing to the actual position in memory which holds the actual string data. Every time a string is modified in some way (when its length changes), a new memory block will be allocated for new string data, and the old string will be freed. Basically, the string pointer changes in order to keep the reference up to date.

Because of this behaviour it’s problematic to free the string with rest of the values when, say, the containing procedure, block or class instance ceases to exist. You’d just lose the pointer, but the string itself remains in memory with no way to access it again! You could require the programmer to manually destroy all reference objects when needed, but it’d be kind of stupid to make strings part of that since they are intrinsic data types (i.e those literal values that come in-built with CoolBasic). That’s why you don’t want to force the use of the “New” keyword when declaring string values and variables. We’re after VB.NETish syntax, after all.

I was first thinking of implementing strings as a true CoolBasic-written class. However, it would mean that the full class source was included at the beginning of every CoolBasic program, increasing line count and thus processing time. It would also require the standard New-assignment for each string. But that was not the main problem since it could easily be overridden by pre-compiler that transforms the syntax during the compilation. In the other hand, had CoolBasic an automatic Garbage Collection system the strings would get deleted automatically with the rest of unreferenced objects. However, CoolBasic does not have such mechanics (the programmer needs to take care of freeing objects by calling their destructors).

The following code will illustrate the loss-of-pointer problem. Consider:

Dim a As String = "A"
Dim b As String = "B"
a = b.Trim()

First, pointer of string “B” will be evaluated and pushed into the stack, followed by function call Trim(). New string with leading and trailing white-spaces removed will be created and pushed into the stack, replacing the original pointer. Now, variable “a” contains pointer to existing string “A” in memory. But the pointer will be replaced in the assignment, and since strings are always unique i.e two variables cannot have reference to the same string in memory, the pointer to string “A” will be lost for good! It will remain in memory with no way to access nor free it in the program. This is called memory leak, and is considered bad programming. It can lead to memory starvation.

Luckily, there are only three things that need to be taken into account to prevent this. Firstly, every time a string literal occurrs, it will be copied and then its (new) pointer gets pushed into the stack. The template never changes, and we now have two identical strings at different memory locations. Secondly, every time there is an assignment to a string type variable, a .Finalize -property will be added by the compiler. And finally, we will take advantage of one of the new features of CoolBasic V3, class properties! Every time a scope (be it a procedure, class destructor or just a code block) ends, there will be automatically added .Finalize=Nothing -calls for all string variables, including arrays. Finalize is a WriteOnly property of the intrinsic String class which acts as a delegate function. This means that we can inject some program code before the actual value assignment happens. Basically, we’ll just deallocate any existing string at the current pointer and then assign a new pointer.

The compiler will transform string assignments like this:

a.Finalize = b.Trim()

Of course this behaviour differs from normal instance variable handling since there can be several references to same object. So far, this is by far the most sophisticated and proper way of handling strings any CoolBasic generation has ever implemented, and I’m very happy about it right now. For the moment, strings are not yet implemented, but at least the concept is now thought through.

Comments are closed.

Copyright © All Rights Reserved · Green Hope Theme by Sivan & schiy · Proudly powered by WordPress