Experimental file diff

Before we begin, I’d like to say that the feature demonstrated herein will NOT be part of the first release of Cool Developer and CoolBasic Classic, but I’m merely playing with a cool idea instead.

I’ve been interested in versioning and Source Control technologies in general lately. One part of version controlling is the ability to handle conflicts when two or more developers check out a file for editing, do their changes and then check them back in. Two simultanous check-outs will often result in a conflict within the source file, and these conflicts have to be resolved through merging. Merging means that both developers’ modifications and fixes are applied to the source file, without the other’s changes getting lost in the process. Merging is one thing, but analyzing the differences is another. I decided to try out some difference tracking techniques.

First of all, there are number of algorithms available. Some are easier to understand than others, and some support more features (such as the ability to detect moved lines and code blocks). The most popular method I found was the Longest common subsequence problem, and it is, in fact, used in most Version Control software. It’s a good method of producing a script that can be used to “patch” the destination file with tiny changes so that it’ll transform to that of the source file. Basically you get a set of “add this to place X” and “remove this from place Y“. Unfortunately, while this technique indeed gets the job done nicely, visualizing these kinds of changes can be confusing. When a conflict occurs, for example, determining whether the merge would break something, can be tricky.

In addition, due to how LCS works, comparing two very big files (10,000 lines or more) quickly starts to eat giant chunks of memory because the algorithm operates with a matrix of n*m, where n is the number of code lines within the first file, and m is the number of code lines within the second one. There are a number of ways how to optimize memory consuption and matrix size, but that is out of scope of this blog bost.

Ideally, when a developer encounters a conflict (and a merge needs to be done), he or she will be provided with two source files side-by-side; one from the server and the other is the local file. This view should visually present those parts that differ between the two versions. If the developer spots a problem that would break the code upon auto-merge, there should be a way to edit the file before committing. LCS cannot by itself provide good enough visual presentation because it can only tell insertions and deletions, but not modifications. For example, editing a single line would produce both “delete” and “addition” changesets.

Then I came across with this Patience Diff algorithm. All in all, it would seem to fit to the purpose perfectly – it offers a very nice and clear presentation of changesets between two files and doesn’t take an awfully lot of memory either. I spent hours and hours trying to find a .NET implementation of it, but as of yet there apparently just aren’t any (the number of “good” C# implementations of the other algorithms is substantially small aswell). So I started working with a proof-of-concept. I spent a little less than 2 days on this challenge, and finally came up with a working solution. When I’m writing this, my work is the only C# implementation of the Patience Diff algorithm that I know of 🙂 . As a side note, there’s seemingly only one Version Control product that uses Patience Diff as its main tool, Bazaar, but apparently one can optionally enable it for Git aswell.

Here’s a sample of two slightly different files: the left side represents the original document, whereas the edited version is on the right:

Different files

The following difference graph can be generated from them (green = insert, red = delete, yellow = modify):

Difference analysis

So what does this mean for CoolBasic? I don’t really know yet, but I have some ideas 🙂

Copyright © All Rights Reserved · Green Hope Theme by Sivan & schiy · Proudly powered by WordPress