The folly of re-use

November 7, 2006

Today I want to talk about re-using code in your programs. Firstly I want to discuss what re-use is, and the benefits it gives you, and then I want to talk about what it is not. It’s the latter that I believe is more important.

Code re-use is following the principle of “only do it once”. Writing an identical line of code in two separate places means

* modifying two lines of code if you want to add/remove/change functionality
* have the potential for introducing a bug or inconsistent behaviour between the two lines
* larger bytecode/binaries

If you are doing exactly the same thing in two places, it makes sense to move that functionality in to a separate method and replace the two lines with method calls. This solves all three of the problems listed above.

So, re-use is a Good Thing.

However, I believe that there is confusion as to what re-use actually is. Re-use is the removal of identical pieces of repeated code and placing them in methods. Re-use is not the removal of identical concepts and placing them in methods.

A common example used in programming tutorials is to add or subtract an amount from a bank balance. A function to do that might look like this:

void ChangeBalance(double amount)
{
_balance += amount;
}

Enthusiastic programmers look at that method and think – “Aha! I can re-use that method to do both deposits and withdrawals, thereby making my code more efficient!”. So they write method calls like this:

// add the deposit on
account.ChangeBalance(newDepositAmount);
// subtract the ATM withdrawal
account.ChangeBalance(-newWithdrawalAmount);

The ChangeBalance() method has been re-used to provide two different pieces of functionality – increasing the balance of an account, and reducing the balance of an account. That is absolutely fine as-is. However, back in the world of real applications, the ChangeBalance() method is going to be far more complicated. The method may have to check whether the account is exceeding its overdraft limit; whether the account is eligible for withdrawals; whether a new level of interest needs to be applied once a balance threshold is reached. All of these complicate the code, and yet they only apply to one of the two scenarios – either increasing or reducing the balance. It is not the code that has been re-used, but the concept and mistakenly so. Given the new conditions we’ve added above, let’s re-write the ChangeBalance() method:

void ChangeBalance(double amount)
{
  double tempNewBalance = _balance+amount;
  if (amount < _accountAllowsWithdrawals)
  {
    if (tempNewBalance > _overdraftLimit)
    {
      _balance += amount;
    }
    else
    {
      throw new Exception(Account.OverdraftLimitExceeded);
    }
  }
  else
  {
    if (tempNewBalance > _currentInterestThreshhold)
    {
      moveToNewInterestThreshold();
    }
    _balance += amount;
  }
}

Our original one-line method has suddenly increased to 30 lines, and that’s just to support the three conditions that I thought up off the top of my head. Real systems have plenty more conditions, account types, special cases and the like that need to be dealt with.

If you look back over the code, what we find is that although we’ve re-used the concept of modifying the account balance, all we have actually done is pollute the ChangeBalance() method with a lot of code that may not be relevant to us at all. And even worse, we have three copies of the line “_balance += amount”!

What we have to do is recognise that increasing an account balance is NOT the same as reducing an account balance. It is far more sensible to have a class interface that looks like this:

void IncreaseBalance(double amount);
void DecreaseBalance(double amount);

You can then implement separate rules for increasing and decreasing balances. If you find a condition that must be fulfilled by both methods, you can factor that out into a function on its own and call it from each method.

I believe that inappropriate complexity is added to programming because coders try to re-factor up-front. Combining functionality into single methods, believing that this constitutes “re-use”, is as dangerous as trying to optimise code for performance without benchmarking it. I think that programmers lives could be made much easier by writing everything in a separate method, and only combining methods together when it is obvious that it should happen.

Make your lives easier, don’t re-use unless you’re really eliminating duplication.

Coming from a background in C Programming, I am very familiar with the “!” or unary-not operator. This operator inverts the value of the operand it is applied to, so that a true operand returns false, and a false operand returns true. As a programmer often involved in the maintenance of other people’s code, I am also very familiar with wading through pages and pages of software that appears little better than hieroglyphics.

Unary-not was widely used back in the days of C, and C++ to an extent, because the language didn’t have a built in “null”. Some programmers or libraries would use “0″ as the null, some would use “-1″. There was no consistent way to check if a condition was true, so unary-not did the job.

These days, in “modern” languages such as Java, C# and VB.NET we have the null value, which is assigned to any reference type variable by default. This means that we can write sensible statements like this:

if (myReferenceType != null)
{
// do something
}

I like code that says exactly what it’s doing – in the case above, it says “if myReferenceType is not null, do something”. Consider the unary-not approach

if (!myReferenceType)
{
// do something
}

Reading that out, we have “if not myReferenceType, do something”. What does that statement mean? It doesn’t make sense when you read it. What if the unary-not operator has been overloaded?

Unary-not is often also applied to boolean values. Which allows us to do some frankly nasty things. Consider the following:

if (!myClass.IsAvailable())
{
// do something
}

“If not myClass is available, do something”. If myClass isn’t available, do something. Now consider:

if (!myClass.IsNotAvailable())
{
// do something
}

“If not myClass is not available, do something”. If myClass is not not available, do something. Or even:

if (!myClass.IsAvailable()==True)
{
// do somsething
}

“If not myClass is available is true, do something”.

As you can see, we can weave all sorts of syntactical loops that are difficult to get out of. Whilst the choice of names for method calls is important (and I will focus on that in a later post) the unary-not operator is causing problems.

Part of the issue with unary-not is that it doesn’t preclude the programmer from using it with other logic operators – so you can, as above, combine it with a method call that you are comparing against a boolean value, and invert the result of the entire statement to produce your eventual value.

My final beef with unary-not is that it’s very difficult to spot. Unlike an “==False” or a “<>0″ it hides itself at the start of a statement, as far as it’s possible to get from the functional parts of the code. As a maintenance programmer it is easy to miss, and of course it changes the whole idea of what you’re reading.

For your own good, and for maintenance programmers who come after you, don’t use unary-not!

I have recently begun a consulting stint with a large firm that is using Subversion (SVN) for its source code control system. This is the first time that I’ve used SVN in anger, although I’ve heard a lot about it – all of it positive.

SVN is the first time I have used a piece of version control software that uses the Modify-Commit model – that is, you can modify any file under version control, but you have to Commit those changes before they become part of the file’s history. This model is also used by CVS, on which SVN is based.

The alternative to Modify-Commit is the Checkout-Checkin model, where a file has to be explicitly set for writing, or “checked-out”, before it can be modified. The file is then “checked-in” to store the changes in the file’s history.

It has taken me a while to get used to the M-C model, and I don’t really like it: I like to have files set to read-only until I explicitly say that I want to edit them.

Another difference between SVN and Clearcase is the repository-wide versioning model. If you modify a single file in SVN, and then commit that change, a new version number is applied to the entire repository. Files that are completely unrelated to the change are marked with a new version number. This makes the revision graph virtually useless. In contrast, the Clearcase version tree browser is the most useful piece of functionality I have ever come across.

The merge-comparision and conflict resolution tools in SVN are pretty dire, far less intuitive than the Clearcase “merge manager” (or “merge mangler”, to its friends).

The biggest problem I have with SVN, however, is the branching model. When creating a branch, and doing a “checkout” to get the latest files, it is far too easy to create your files in the wrong directory structure, since SVN doesn’t enforce any kind of directory level on you. In contrast with Clearcase, where any static views are created in “C:\Clearcase\vob-name\”. This is one occassion where I’m happy to trade flexibility for consistency. I need to know that the half-hour checkout process I’m about to undertake is going to end up in the correct directory.

Clearcase allows you to create branches from any point in your source-code repository and then merge them back into your “main” branch. Likewise, SVN allows you to create branches from your “trunk” branch and then merge them back in. SVN, however, doesn’t properly version the directories themselves. This means, for example, if you add a new file (“newfile.cs”) to your branch directory, and then merge your changes in to trunk, “newfile.cs” does not appear. Instead you get the error “Skipped missing target”. I haven’t yet figured out how to get new files merged from one branch to another, short of manually copying the files and then adding them in.

My final issue with SVN is one that applies to most open source software, and that is “fitness for purpose”. I have managed to get SVN to tie itself in knots a couple of times, which have meant I’ve had to delete the .svn directories, back up my changes, re-checkout and then apply my changes manually. Basically, the SVN equivalent of a “blue screen of death”. That is OK if I’m working on an personal project, or perhaps in a very small team on a limited codebase. Unfortunately, on a global project, with millions of lines of code, and teams of developers working in different, to sometimes differing agendas, it creates chaos. Also, the documentation available is very limited, and in some areas it is non-existent. More-than-trivial-branching is one area where I feel I am “writing the book” on it. Again, that’s fine in a personal/amateur environment, but at the enterprise level it is simply not sustainable.

SVN is OK, and as a piece of software I am sure it is a great technical achievement, but it is not an enterprise level source control system, and it should not pretend to be one. In the final analysis, SVN’s shortcomings are not the fault of SVN or its programmers, they are the fault of the decision-makers who decide to take an amateur open-source product and trust a multi-billion dollar firm’s entire codebase to it.

Follow

Get every new post delivered to your Inbox.