Data structures example: linked lists

What’s a linked list ?

A linked list is very similar to an array in a way that both are structures capable of storing elements sequentially. However, taking a closer look reveals that they work totally different internally. Like all arrays the build-in array class in ActionScript stores multiple elements in a single array instance accessed by indexing, while a linked list uses a separate object instance for each data element. This object is called a node, and besides storing the data it also needs to hold a reference to the next node in the list. The result is the most simple form of a linked list, the singly linked list:

Figure 1: A singly linked list

If a node also stores a reference to the previous node it can move in both directions, forth and back; this is obviously called a doubly linked list, which is shown below.

Figure 1: A doubly linked list

The SLinkedList and DLinkedList classes in the data structure package act as containers for storing the nodes and provide methods for manipulating (appending/deleting etc.) the list, taking care of everything. But a linked list can also be solely defined by a bunch of node objects, without using a manager class. For example, in a singly linked list, a node object could be defined like this:

class Node
{
    var data:*;
    var next:Node;
}

Then a basic functional list could be created like this:

var a:Node = new Node();
var b:Node = new Node();
var c:Node = new Node();

a.next = b;
b.next = c;
c.next = null;

Pretty simple, right? Note that you don’t have to store node ‘b’ and ‘c’ explicitly – as long as node ‘a’ is stored properly, the garbage collector has no way of removing them, so they stay alive.

A doubly linked list is more flexible and some operations like unlinking nodes from the list are faster. So why didn’t I omit the singly linked version? Because if you know in advance that you’ll only move in one direction and seldom modify the list, the singly linked version is a good choice because it needs less memory and you get a slight performance gain, as the node management is a little bit simpler behind the scenes.

Arrays vs. Linked Lists

A weakness from which an array suffers is that although it gives direct access to any array cell, it’s clumsy when it comes to modifying the array, like removing or inserting elements after it has been initialized. This is demonstrated below, where you can click a cell and drag it to remove it from the array, which is essentially a splice(nodeIndex, 1) operation.

Figure 3: An array. Drag cells to the screen edge to remove them from the array.

Now the resulting ‘gap’ is filled, and all elements right of the gap are shifted one position to the left. This takes time and gets worse the bigger the array gets. On very large arrays this operation can lead to a hefty slow down (see also the queue post) and should be avoided for real time applications. The worst-case happens when you remove the first element (shift()), while only the last element can be removed instantly (pop()). The same of course applies for inserting elements. So in the end an array is great for a stack, but bad for a queue if implemented in a naive way.

Now this is where a linked list shines, because data can be removed or inserted almost instantly, making it perfect for situations where you need a structure that constantly changes.

Figure 4: A linked list, drag nodes to the screen edge to remove them, or press I to insert a node.

As with the array demo, you can drag a node to the screen edge to unlink it from the list. In addition, you can also insert a node by pressing the ‘I’ key. Watch what happens: no large chunks of data have to be moved around in memory, just the references, visualized by the arrows, need to be updated, which is fast and simple. If you hold a node near an edge of the screen, you see how the links would change. Grey arrows represent obsolete links that will be removed together with the dragged node. The transparent blue and purple links are rerouted to ‘bridge’ the current node.
Looking at the code, each node has a small helper function to unlink it from the list that looks like this:

function unlink():void
{
    //new purple link
    if (prev != null) prev.next = next;
    
    //new blue link
    if (next != null) next.prev = prev;
    
    //obsolete grey links
    next = prev = null;
}

If the first (head) or last (tail) node is removed, this gets even simpler:

if (node == head)
{
    head = head.next;
}
else if (node == tail)
{
    tail = tail.prev;
}

If you want to remove a node from the list using the SLinkedList or DLinkedList class, you first create an iterator object pointing to the node, and then remove it later on at any time calling the remove() method on the iterator itself or the linked list:

var itr:DListIterator = new DListIterator(myList, myNode);

itr.remove();

//or
myList.remove(itr);

This is even possible if the list was modified before. With an array, you have to keep an eye on the array index of the stored item whenever you modify the array.

Another advantage of linked lists is that they are more human-readable. Modifying indices by hand is error-prone and I think the most common mistakes come from the zero-based nature of arrays (the last element is n-1 and not n). To show you the difference, the common brute force method for checking a number of objects against each other (e.g. for collision queries), implemented with arrays and linked lists are shown below.

//arrayed version
var k:int = myArr.length;
for (var i:int = 0; i < k - 1; i++)
{
    oi = myArr[i];
    for (var j:int = i + 1; j < k; j++)
    {
        oj = myArr[j];
        collide(oi, oj);
    }
}
//linked version
var node:SListNode = head;
while (node)
{
    //check node against all following nodes
    var walker:SListNode = node.next;
    while (walker)
    {
        collision(node, walker);
        walker = walker.next;
    }
    //repeat with next node
    node = node.next;
}

Iterating over a linked list

Here are the basic ways to iterate over a list, starting with the ‘cleanest’ way using the iterator pattern. It decouples the iteration from the implementation resulting in a common interface all data structures in the package support.

//get iterator instance
var itr:DListIterator = new DListIterator(list);

//or simply
itr = list.getListIterator();
    
//start iteration
for (itr.start(); itr.valid(); itr.forth())
{
    trace(itr.node, itr.data);
}

This can also be written using a while loop:

//restart
itr.start();

while (itr.valid())
{
    tr(itr.node, itr.data);
    itr.forth();
}

Both iteration styles are fine in most cases, but when performance is critical you should definitely avoid using the iterator interface. Although it offers a unified way of accessing data, the iterator pattern needs many function calls. Therefore we modify the second iteration method and change it slightly to arrive at a very lightweight iteration method:

//take the first node
var walker:DListNode = list.head;
while (walker)
{
    trace(walker, walker.data);
    walker = walker.next;
}

The code above just uses the node’s pointer to the next node to successively jump from node to node until a dead end is reached, and that happens when the tail node is referenced, which obviously has no valid next node.

Performance & optimization

Comparing the latter while loop with a standard for loop yields that it’s slightly slower, but there isn’t really a big difference between the loop bodies. Keep in mind that it’s more important what you are actually doing inside the loop body than purely looking at the loop code itself, so the numbers won’t have a measurable impact on performance. Anyway here are some numbers:

size for (i = 0; i < k; i++) while(i) {i = i.next}
1e+7 30 ms 164 ms
1e+6 5 ms 17 ms
1e+5 1 ms

But let’s move from empty loop bodies and try to access the data, which is much more interesting:

var val:*
//arrayed version, [] access
for (var i:int = 0; i < 1e+7; i++)
{
    val = myArray[i];
}
//linked version, '.' access
while (walker)
{
    val = walker.data;
    walker = walker.next;
}

Quick note: leaving val untyped is surprisingly a lot faster than strongly typed.

size array access [] object access .
1e+7 131 ms 131 ms
1e+6 13 ms 13 ms
1e+5 1 ms 1 ms

So, no difference here, meaning that linked lists can be a perfect replacement for arrays in terms of data access speed, at the same time offering superior performance when modifying big data sets.

Instead of using a list manager like the DLinkedList class, you can implement a custom ‘low-level’ linked list if you want to get the most out of the AVM. All you need is a basic node class, capable of storing data and and a reference to other nodes. But be aware that if you start playing around with references, you should assure that every reference is removed properly if unused, otherwise you quickly run into hairy problems resulting in strange behavior and memory waste/leak…

That should be all you need to know about linked lists.
As usual you can download the examples for this article here.

See also:
The graph class
The queue class
The tree class