Linked list performance

It’s a fact that haXe creates swf files running faster compared to mxmlc. Amazingly, haXe is also faster at compile time. But to which extent are haXe generated swf files faster? Optimizing code especially pays off for “low-level stuff” like data structures, the building blocks of many algorithms. With this in mind, I picked the doubly linked list class from my ds library (extensively used in almost all my projects) and played around with different optimization levels to figure out how those translate into run time improvements of the flash platform.

So, which options do we have?
First, we can create “true generic classes” at compile time by adding -D generic to the haXe command line. We gain almost 40%. Why? Because we get rid of dynamic access (*) – perfect for AVM2, which is a statically typed runtime. You can read more about this here.
Next, let’s turn on function inlining, so we don’t have to deal with the overhead of calling many small functions. Inlining is always enabled by default, but you can turn it off by compiling with --no-inline, just so you can see the difference. This increases performance by another 70%.
So far we have twice the performance, thanks to smart compiler optimizations. Regarding linked lists (and many other linked structures in ds) we have one secret weapon left: “node caching”, a feature that was introduced in ds version 1.12 back in 2010. By passing a size parameter to the constructor we can reserve additional memory to minimize costly instantiation of linked list nodes:

var reservedSize = 1000;
var list = new de.polygonal.ds.DLL(reservedSize);

From now on, every time you remove an element from the list, the node object storing that element is kept for later reuse – ready to store another incoming element. The resulting behavior is an incrementally filled object pool. And don’t worry if the size exceeds the predefined size – in this case objects are simply created on-the-fly. And here are the final numbers:

benchmark results

Notice that due to additional bytecode instructions for debugging, input validation and assert statements the debug build is awfully slow, so better not use it for deploying your application ;). The release build is effectively what you get when compiling ds_release.swc with mxmlc, on par with other libraries like the LinkedList class found in the as3commons Collection package. Actually, haXe compiled SWC files are a bit faster because the generated bytecode is more efficient and private methods are still inlined, but in order to unleash the full potential of the flash player there is currently no way around haXe.

About “-D generic”

This is a custom conditional compilation switch which is available in ds. If defined, classes defining linked structures will implement the marker interface haxe.rtti.Generic. This tells the compiler to create specialized classes for each type parameter. Here is a simplified example:

class Container
#if generic
implements haxe.rtti.Generic
    public var value:T;
    public function new() {}

If we create an object of type Container like so…

var myFloatContainer = new Container();

…the compiled result will look like this…

class Container_Float
    public var value:Float;
    public function new() {}

…instead of:

class Container
    public var value:Dynamic;
    public function new() {}

Now everything is strictly typed and we don’t need slow dynamic access.

Yo, Ho! Welcome on deck, deque!

The newly released ds library (version 1.3) now includes an efficient deque implementation.

What is a deque?

A deque is a double-ended queue and pronounced “deck”. In contrast to a queue, which allows insertions at one end and removals at the opposite end, a deque allows insertions and removals at both ends. The new Deque interface defines four methods to modify the Deque along with two methods to access the element at the front (head of the list) and back (tail of the list):

interface Deque
    function front():T;
    function back():T;
    function pushFront(x:T):Void;
    function popFront():T;
    function pushBack(x:T):Void;
    function popBack():T;

There are two common ways to implement a deque – by using a doubly-linked list or by using arrays. While the first option is very simple and straightforward, the latter one is more of a challenge because there are many approaches which all have their pros and cons.

Linked implementation – de.polygonal.ds.LinkedDeque

This is basically just a stripped down, lightweight version of the doubly linked list class DLL. A doubly-linked list is capable of doing insertions & removals in constant time, but the operation itself is rather slow, since we need to instantiate a node object that stores the element and manage pointers to prevent the list from falling apart.
Another important thing to watch out is that the LinkedDeque class requires considerable more memory – for each and every element we need a container object for storing the “cargo” (+16 bytes), a reference to the previous and next node (+4 bytes each) and finally a reference that points to the element (+4 bytes). So in Flash we need a total of 28 extra bytes per element. This means that 86% of the space is wasted on nodes and pointers!
Despite those shortcomings it’s still useful for small to average sizes; and to minimize costly node object instantiation the LinkedDeque class (and all linked structures in ds) comes with built-in node pooling – all you have to do is to create a LinkedDeque object with a reserved size greater than zero:

var deque = new LinkedDeque(100); //reuses up to 100 nodes

This is very useful if the maximum size is known in advance, as performance nearly doubles. The benchmarks below were all done with “node-caching” turned on.

Arrayed implementation – de.polygonal.ds.ArrayedDeque

While most sites explain how a doubly-linked list relates to a Deque, there is not much information available on how to efficiently implement a deque on top of an array. My first attempt used a circular array similar to the ArrayedQueue class. It seemed to be a good solution until I realized that resizing the deque would be slow and difficult to implement so I discarded this idea and started from scratch. This time I decided to use an array of arrays (turned out the C++ STL deque uses this approach) and I was very happy with the result.
Let me briefly explain how it works. Data is organized in chunks or blocks of memory. A block is simply a fixed-sized array and all blocks are stored in a separate, dynamic array. Upon initialization the deque only contains a single block but once it fills up an additional block is allocated: adding elements to the back calls something like “blockListArray.push(newBlock)” and adding elements to the front calls “blockListArray.unshift(newBlock)”.
Inserting elements at the beginning of an array is slow but in this case it doesn’t matter because the list of blocks is usually very small and it doesn’t happen very often. Therefore we can say that the ArrayedDeque performs modifications at both ends in amortized constant time.


So how do both classes perform? Showing raw numbers would be pointless without being able to compare them to some reference values. For this reason I’ve created a minimal deque implementation called “NaiveDeque”:

class NaiveDeque implements Deque
    var _data:Array;
    public function new() {
        _data = new Array();
    public function front():T {
        return _data[0];
    public function pushFront(x:T):Void {
    public function popFront():T {
        return _data.shift();
    public function back():T {
        return _data[_data.length - 1];
    public function pushBack(x:T):Void {
    public function popBack():T {
        return _data.pop();

As you see it just uses what the Flash language has to offers. Clearly, the problem which the code above is that modifications at the beginning of the array are slow because a lot of memory is shifted around, either to fill a gap or to make room for a new element. So we kinda have the worst-case at the front and the best-case at the back. It’s not very fair to use this as a reference but on the other hand it shows how things can be drastically improved by using some elbow grease to write clever data structures. Here are the results:

deque benchmark

size 1000 2000 3000 4000 5000
NaiveDeque 1.0 1.0 1.0 1.0 1.0
LinkedDeque 1.65 4.4 6.6 7.2 7.7
ArrayedDeque 5.0 9.3 11.7 14.8 19.3

The benchmark results were taken by equally distributing n elements at both ends, like this:

for (i in 0...n)
    if ((i & 1) == 0)
for (i in 0...n)
    if ((i & 1) == 0)


One real-world application that comes to my mind is a software application’s list of undo operations; but as a deque is a very flexible abstract data structure it can be used in countless algorithms. The deque classes are all documented so I hope it’s clear how to use them. If you have problems feel free to contact me.