Sunday, November 13, 2011

Getting Started - Inserting Into A Doubly Linked List

This is a problem that I think shows how I approach programming in a pretty simple example.

Write the insert function for a doubly linked list. First I'll start with what I often see from new programmers. Don't worry, we'll clean it up.

void Insert(int location, Data* data)
{
    ListNode* ptr = m_Head;  
    if (!ptr)  
    {  
      m_Head = new ListNode();  
      m_Head->m_Data = data;  
      m_Head->m_Prev = null;  
      m_Head->m_Next = null;  
      return;  
    }  
    while(location && ptr->m_Next)  
    {  
      ptr = ptr->m_Next;  
      --location;  
    }  
    ListNode* oldNext = ptr->Next;  
    ptr->m_Next= new ListNode();  
    ptr->m_Next->m_Data = data;  
    ptr->m_Next->m_Prev = ptr;  
    ptr->m_Next->m_Next->m_Next = oldNext;  
}

OK. I think that has it. There are some bugs in there and some really tweeky ways of doing things. We won't fix this code. We will look at how to write it in a way there's nothing to fix.

Let's start at the beginning:

 void Insert(int location, Data* data)

First, it returns void so there really isn't any way to know if there are errors. And there can be no errors because the code just blasts them away by making assumptions. This restricts the ability of code use in larger projects and in projects you haven't even thought of yet.

Second, ListNode is treated like a struct. We're in C++. Even structs can do things for us.

 m_Head = new ListNode(data);  

should have been all that was necessary.

But, I want to draw attention to the fact that this function does 4 things. When I write functions they do only one thing. Sometimes that one thing is to assemble other functions. Here are the 4 things as I see it.

  1. Create a new ListNode and assign its data.
  2. Find a location in the list.
  3. Validating the position.
  4. Add the node to the location.
So how can we write this better?

void Insert(int location, ListNode data)  
 {  
    List back = Split(location);  
    PushBack(data);  
    PushBack(back);  
 }  

So that is much cleaner and has just as many bugs and quirkiness as before. But, which would be easier to fix? Let's fix it.

Iterator Insert(Iterator location, Data data)  
 {
    List back = Split(location);
    PushBack(data);
    Iterator rVal = end();
    --rVal;
    PushBack(back);
    return rVal;
 }

First, to be clear, this is not the right way to make a list class, but the methodology is the same as the STL. If you read through that code you will see many, many small functions with each changing often only a single element and passing the result down the chain. You might wonder if all I did was move the same code to another function. Even if that was all, it's an improvement. We now have three new useful functions for our list class. But let's look at what we now have.
  • If the location is the same as end() then Split() returns an empty list. So we have no conditionals in our code for end() checking. 
  • PushBack() knows it is the end so it doesn't bother checking if there is anything else to connect.
  • The second version of PushBack can handle empty Lists in whatever manner best suits the implementation. 
Any checks that remain are needed in other functionality anyway and others have been removed. The biggest thing is the code is easier to read, use, and maintain. 

There are a few notes for code I have seen far too often. Notice I do not name one function PushBackNode() and the other PushBackList(). This is unnecessary and makes the code harder to use. Second I don't have the function create the iterator or the node. PushBack() handles data to Node conversion. In the case of the node you could use a single argument constructor on ListNode and not declare it explicit. This will automatically convert data to ListNode. Validating the iterator is not the responsibility of this class. Add a hierarchy of calls to add finding and validating above the actual function that does the work. Finally, and this is very important, don't worry about function call overhead. Working in games I hear performance as an excuse for poor code far too often. It's nonsense. Good code is fast. If your code is slow, your algorithm is slow. Compilers and computers have gotten very good at dealing with many small functions. If you need to go even faster then look for dumb performance mistakes like initializing a variable unnecessarily. After you have exhausted all else, start at the top and look again.

I hope this was useful. Feel free to comment, suggest, ask questions, or request topics.

Why This Blog

I have spent a long time programming, since 1979, and I have mostly taught myself. I've had the chance to work with some amazing programmers who could make the computer dance, but rarely have I worked with programmers who could write easy to read, easy to use, easy to debug (though it rarely needed it), and easy to maintain code. Browse Amazon or a local bookstore and there are very few books on the subject and of those some are out of date, assume the reader knows too much, or is incomplete. I don't know if this blog can address those problems, but that is what I will try to do.

Jackson Pollack, Number 1, 1950
Since the intent is to introduce an approach to programming that may be different than many people use there is no target skill level. Some of the best programmers I know write code that can only be described as Pollack-esque.

It is not the intent of this blog to say there is only one way to code, nor is it my intent to judge others' code or programming style. I have found an approach that works well for me and has helped many new programmers organize their thinking.

So with that out of the way let's get on with it.