A little alchemy in hx3ds

March 14, 2009 on 11:27 pm | In Actionscript, data structures | 15 Comments

haXe provides an easy API for a bunch of special byte code instructions which were silently included in the flash player 10 release and are mainly used by the alchemy toolkit. All the functionality is found inside the flash.Memory class. Here is a basic example on how to use the Virtual Memory API:

var bytes:ByteArray = new ByteArray();
bytes.length = 1024;
flash.Memory.select(bytes);

flash.Memory.setDouble(0, 1.0);
flash.Memory.setDouble(7, 2.0);

flash.Memory.getDouble(0);
flash.Memory.getDouble(7);

Memory.select() puts the byte array into “virtual memory” so we can access it with the Memory class. Since in this case we are writing double-precision IEEE 64bit floats (aka the Number type in ActionScript) we need to offset the next position by 8 bytes.

Byte array on steroids

Reading and writing values through the Memory class is very quick. Strangely it wasn’t possible to create predictable benchmarks - the results of all my tests greatly varied across different machines. I don’t really know what’s going on behind the scenes (maybe the JIT does some magic?) but the more Memory operations are grouped together, the bigger the difference becomes. I wrote two simple benchmarks to get some numbers; Benchmark1.hx does a single read/write operation per iteration, whereas Benchmark2.hx does the same operation ten times per iteration:

benchmark

Benchmark1.hx: read/write operations relative to array access, higher is better

benchmark

Benchmark2.hx: read/write operations relative to array access, higher is better

Applying synthetic benchmarks to real world applications can be misleading, so I’ve also run some test over my data structures. The chart below shows the outcome for a bit vector. Other structures like queues, stacks or 2D-Arrays behave about the same. Another nice speed boost if you are dealing with numbers :-)

BitVector as3ds vs hx3ds

BitVector speed relative to as3ds version, higher is better

A reusable solution

The question however is how to share data across the application. The obvious way is to store your data in separate byte arrays an then call Memory.select(yourByteArray) just before accessing that data. Unfortunately this is too slow to be useful. The alternative is to allocate a big chunk of memory when your application starts and then forward all your operations to some kind of manager class. It should be responsible for allocating empty space and freeing up used space once it’s no longer needed (so you don’t run out of memory). I have implemented those ideas in a MemoryManager class, which is part of the de.polygonal.ds.mem package:

MemoryManager.allocate(4, 1);

var bitMemory:BitMemory      = MemoryManager.getBitMemory(100);
var byteArray:ByteMemory     = MemoryManager.getByteMemory(100);
var intArray:IntMemory       = MemoryManager.getIntMemory(100);
var floatArray:FloatMemory   = MemoryManager.getFloatMemory(100);
var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100);

The first line allocates 4 KiB of total memory and 1 KiB of special “raw” memory (e.g. useful for doing math tricks like the fast inverse square root Nicolas demonstrated some time ago). The next few lines create number arrays capable of storing bits, bytes, integers, floats and doubles respectively. Reading and writing values is now very straightforward:

var intArray:IntMemory = MemoryManager.getIntMemory(100);
var value:Int = intArray.get(i);
intArray.set(i, val);

This adds some overhead because each array needs to maintain an offset address and also scale the position by the size of the data type. So we need an additional add and bit shift operation. This is how the index is computed for an “integer memory array”:

memoryIndex = memoryOffset + (integerIndex << 2);

An AS3 integer is a double word so we multiply the position by 4 (8 bytes * 4 = 32bits) and add the offset.

MemoryManager internals

Individual chunks of memory are represented by a linked list of intervals which can either be empty or full. An interval is represented by a MemoryArea object that just stores it’s start and end byte inside the byte array. Empty intervals are pushed to the left (the head of the list) so finding empty space is done as fast as possible while full intervals are pushed to the right. This is how the memory areas look like after allocating the 5 memory arrays from the top example:

memory internals

blue: empty space, magenta: used space

In order to free up used space every memory array has a purge() method. Frequent allocation and deallocation will lead to fragmentation (the same basically happens to your hard drive) so the MemoryManager class has a defragment() method that will clean up the mess. Defragmentation is done automatically in case the manager isn’t able to find a sufficient big amount of continuous space for a get[Type]Memory() request. If there still isn’t enough space available after defragmentation the class will throw an error indicating that you should raise the amount of bytes for your application.

hx3ds

March 12, 2009 on 1:11 am | In Actionscript, data structures | 7 Comments

As the name suggests, hx3ds is a port of as3ds for haXe and is now available at lib.haxe.org. hx3ds only supports the flash player 10 target, as it makes extensive use of the Vector class. If you need data structures that compile across all platforms, take a look at colhx instead.

Here’s a list of new features:

  • orders of magnitude faster
  • collections now support clone() and shuffle() operations
  • object pooling framework
  • revised graph, tree and linked list classes
  • memory manager for the virtual memory API (more on this soon)

I’m planning to port the new features back to as3ds as soon as it leaves beta.

The best thing is that hx3ds runs considerable faster than as3ds. This is good news since data structures are usually used at a low level in an application and so the speed increase should be noticeable instantly. Here are some numbers for the two dimensional array:

array2 as3ds vs hx3ds

relative speed of hx3ds compared as3ds, higher is better

This difference stems mainly from haXe using proper integer opcodes, method inlining and generics. Let me shortly explain the last point: Say you want to write a custom List class and store the data within a Vector using composition. In ActionScript this would look like this:

package {
    public class MyList {
        private var _data:Vector.<Object>;
       ...
    }
}

You are forced to use a dynamic Vector because you don’t know what the users will throw in. There’s nothing wrong with it, but you can’t expect any performance wonders. As usual, some hard facts:

array vs vector

combined read/write execution speed relative to an array, higher is better

In haXe the former code example would look like this:

class MyList<T> implements haxe.rtti.Generic {
    private var _data:flash.Vector<T>;
    ....
}
...
var list:MyList<Int> = new MyList<Int>();

By implementing the Generic interface you tell the compiler to create unique classes for each type parameter, gaining both type safety and speed. After compilation the result would look like this:

class MyList_Int implements haxe.rtti.Generic {
    private var _data:flash.Vector<Int>;
    ....
}

Data Structures: More on Linked Lists

November 26, 2007 on 2:57 pm | In Actionscript, data structures | 18 Comments

I received a bunch of emails asking how to use my linked lists so I decided to revise the classes by changing and adding some methods. I did this with the Array class in mind, so hopefully the classes are now more accessible if you are used to arrays. The newest release (0.9) also contains bug fixes (thanks for reporting them!) and performance improvements. Here is a table comparing the methods of the Array and Linked List classes:

Array SLinkedList / DLinkedList
push(…args):uint append(…args):NodeReference
pop():Object removeTail():*
unshift(…args):uint prepend(…args):NodeReference
shift():Object removeHead():*
join(sep:*):String join(sep:*):String
reverse():Array reverse():void
indexOf(searchElement:*, fromIndex:int = 0) nodeOf(obj:*, from:DListIterator = null):DListNode
lastIndexOf(searchElement:*, fromIndex:int = 0×7fffffff):int lastNodeOf(obj:*, from:DListIterator = null):DListNode *1
sort(… args):Array sort(…args) *2
concat(… args) concat(…args) and merge(…args)
splice(startIndex:int, deleteCount:uint, … values):Array splice(start:NodeIterator, deleteCount:uint, …args):LinkedList

New and changed methods

New methods include join, reverse, nodeOf, lastNodeOf, concat, merge, sort, splice, shiftUp, popDown. Because you can’t refer to a linked list node by index, the nodeOf and lastNodeOf methods return an iterator referencing the node that contains the searched element instead. lastNodeOf is only defined for a doubly linked list since the singly linked version can’t go backwards.

concat works exactly like the arrayed version. It takes a comma-separated list of linked list objects, connects them and returns a larger list. Whereas concat leaves the original and passed lists unchanged, merge directly modifies the existing list, making this method a lot faster, since it just juggles some node references around, but this also means that the passed lists become invalid, so keep that in mind.

append, prepend, insertAfter and insertBefore now accept multiple arguments (it’s also faster to add various items in one shot). This matches the array’s push() and unshift() methods. Speaking of arrays, removeTail and removeHead now return the data of the removed node, like Array.pop() and Array.shift() do.

Sorting linked lists

By far the biggest addition is the ability to sort linked lists, and the new sort() is a serious method. It almost works exactly like the arrayed version. One difference is that numeric sorting is the default behavior simply because sorting numbers is far more often used in game programming. It’s also the fastest solution because the comparison function is inlined.

Another difference is that you can switch between two sort algorithms. The default sort method is the merge sort algorithm, which has the same complexity as the array’s build in quick sort algorithm, namely ÃŽËœ(n log(n)). In terms of performance it’s almost exactly half as fast as quick sort, not too bad.

Once your list is sorted and you start adding or removing elements, you can switch to the insertion sort algorithm (a stable implementation that directly works on a linked lists caused me some headaches - moving references around can be really confusing!) to resort the list. Why? The worst-case time complexity of insertion sort is Θ(n^2), occurring when numbers in the unsorted list are sorted in decreasing order from left to right. Here every number encountered needs to be swapped to the start of the list, but if the list is almost sorted, insertion sort has a linear behavior and thus outperforms other sort methods, since only a few swaps need to be done. Here are some benchmarks:

Input size Array Linked List
  quick sort insertion sort merge sort insertion sort
  unsorted nearly sorted unsorted nearly sorted unsorted nearly resort unsorted nearly sorted
1e+3 1 1 20 0 1 0 5 0
1e+4 6 3 2300 2 11 10 370 1
1e+5 80 50 fails* 20 180 320 fails* 8
1e+6 1400 600 fails* 220 3000 4500 fails* 130

As a conclusion, you can’t beat the build in quick sort for an unknown input set, only for nearly sorted data using insertion sort. Using linked lists & insertion sort is the fastest choice for maintaining sorted data in Actionscript, as it even outperforms arrays. An interesting behavior is that resorting a list with either quick or merge sort still needs quite a lot of time, in case of the merge sort it’s even slower afterwards! Besides, an efficient insertion sort implementation is only possible for doubly linked lists (the singly linked list uses the arrayed version instead). Finally some code examples:

var myList:DLinkedList = new DLinkedList();
myList.append(4, 5, 1, 2, 6, 3);

//default: merge sort, numeric
myList.sort();

//numeric, descending, insertion sort
myList.sort(SortOptions.DESCENDING | SortOptions.INSERTIONSORT);

//character-string, caseinsenstive
myList.sort(SortOptions.CHARACTER_STRING | SortOptions.CASEINSENSITIVE);

//custom compare function, list stores Player objects
function sortPlayers(a:Player, b:Player) {
	return a.health - b.health;
}
myList.sort(sortPlayers, SortOptions.INSERTIONSORT);

Quadtree demonstration

September 9, 2007 on 10:39 pm | In Actionscript, data structures | 29 Comments

I’ve been working on different approaches to speed up the collision detection stage for some while now (mainly for the motor engine and some games). This includes a Quadtree which I started working on last year, but just recently got around to pick it up again. I’m still fine-tuning the code, so I can’t share it at the moment. But, because I added some visualization stuff to help fix some bugs, I thought I could put together a little demo showing the Quadtree in action :).

I’m also thinking of creating another package (like the data structures) which would include some highly optimized broadphase collision detection algorithms, but this depends how much spare time I have - which is very limited at the moment.

But what exactly is a Quadtree ? Generally speaking, it’s a tree structure using a principle called ’spatial locality’ to speed up the process of finding all possible collisions. Because objects can only hit things close to them, testing against distant objects should be avoided.

The easiest way to quickly compute a collisions set is to divide the collision space (e.g. your game area) into a uniform grid and register each object with all intersecting grid cells. That’s also called spatial hashing. One disadvantage is that it doesn’t cope well with objects of varying sizes. If the grid cells are too big, many unnecessary collision checks are done, and if they are too small, you end up searching many registered cells containing the same object.

The quad tree tries to overcome this weakness by recursively splitting the collision space into smaller subregions, similar to the RDC algorithm, with the difference that every region is divided exactly into 4 smaller regions of the same size, so you end up having multiple grids with different resolutions, where the number of cells in a region goes go up by a power of two every time the resolution is increased. So every object resides in the cell (called quad node or quadrant) with the highest (finest) possible resolution. A search is made by starting at the object’s node and ‘bubbling’ up to the root node. Wikipedia has a nice explanation, and there exist many variations.

The quad tree in the demo contains 1000 objects. I’m not sure if the quad tree is faster than a regular grid, but I’ll compare both versions soon. And don’t be afraid of the memory usage, it needs tons of memory because I’m storing all tree nodes for drawing them on the screen.

A Quadtree containing 1000 objects. (9 levels, leaf size 8×8px)

Data structures example: linked lists

August 13, 2007 on 1:11 am | In Actionscript, data structures | 28 Comments

What’s a linked list ?

A linked list is very similar to an array in a way that both are structures capable of storing elements sequentially. However, taking a closer look reveals that they work totally different internally. Like all arrays the build-in array class in ActionScript stores multiple elements in a single array instance accessed by indexing, while a linked list uses a separate object instance for each data element. This object is called a node, and besides storing the data it also needs to hold a reference to the next node in the list. The result is the most simple form of a linked list, the singly linked list:

Figure 1: A singly linked list

If a node also stores a reference to the previous node it can move in both directions, forth and back; this is obviously called a doubly linked list, which is shown below.

Figure 2: A doubly linked list

The SLinkedList and DLinkedList classes in the data structure package act as containers for storing the nodes and provide methods for manipulating (appending/deleting etc.) the list, taking care of everything. But a linked list can also be solely defined by a bunch of node objects, without using a manager class. For example, in a singly linked list, a node object could be defined like this:

class Node
{
    var data:*;
    var next:Node;
}

Then a basic functional list could be created like this:

var a:Node = new Node();
var b:Node = new Node();
var c:Node = new Node();

a.next = b;
b.next = c;
c.next = null;

Pretty simple, right? Note that you don’t have to store node ‘b’ and ‘c’ explicitly - as long as node ‘a’ is stored properly, the garbage collector has no way of removing them, so they stay alive.

A doubly linked list is more flexible and some operations like unlinking nodes from the list are faster. So why didn’t I omit the singly linked version? Because if you know in advance that you’ll only move in one direction and seldom modify the list, the singly linked version is a good choice because it needs less memory and you get a slight performance gain, as the node management is a little bit simpler behind the scenes.

Arrays vs. Linked Lists

A weakness from which an array suffers is that although it gives direct access to any array cell, it’s clumsy when it comes to modifying the array, like removing or inserting elements after it has been initialized. This is demonstrated below, where you can click a cell and drag it to remove it from the array, which is essentially a splice(nodeIndex, 1) operation.

Figure 3: An array. Drag cells to the screen edge to remove them from the array.

Now the resulting ‘gap’ is filled, and all elements right of the gap are shifted one position to the left. This takes time and gets worse the bigger the array gets. On very large arrays this operation can lead to a hefty slow down (see also the queue post) and should be avoided for real time applications. The worst-case happens when you remove the first element (shift()), while only the last element can be removed instantly (pop()). The same of course applies for inserting elements. So in the end an array is great for a stack, but bad for a queue if implemented in a naive way.

Now this is where a linked list shines, because data can be removed or inserted almost instantly, making it perfect for situations where you need a structure that constantly changes.

Figure 4: A linked list, drag nodes to the screen edge to remove them, or press I to insert a node.

As with the array demo, you can drag a node to the screen edge to unlink it from the list. In addition, you can also insert a node by pressing the ‘I’ key. Watch what happens: no large chunks of data have to be moved around in memory, just the references, visualized by the arrows, need to be updated, which is fast and simple. If you hold a node near an edge of the screen, you see how the links would change. Grey arrows represent obsolete links that will be removed together with the dragged node. The transparent blue and purple links are rerouted to ‘bridge’ the current node.
Looking at the code, each node has a small helper function to unlink it from the list that looks like this:

function unlink():void
{
    //new purple link
    if (prev != null) prev.next = next;

    //new blue link
    if (next != null) next.prev = prev;

    //obsolete grey links
    next = prev = null;
}

If the first (head) or last (tail) node is removed, this gets even simpler:

if (node == head)
{
    head = head.next;
}
else if (node == tail)
{
    tail = tail.prev;
}

If you want to remove a node from the list using the SLinkedList or DLinkedList class, you first create an iterator object pointing to the node, and then remove it later on at any time calling the remove() method on the iterator itself or the linked list:

var itr:DListIterator = new DListIterator(myList, myNode);

itr.remove();

//or
myList.remove(itr);

This is even possible if the list was modified before. With an array, you have to keep an eye on the array index of the stored item whenever you modify the array.

Another advantage of linked lists is that they are more human-readable. Modifying indices by hand is error-prone and I think the most common mistakes come from the zero-based nature of arrays (the last element is n-1 and not n). To show you the difference, the common brute force method for checking a number of objects against each other (e.g. for collision queries), implemented with arrays and linked lists are shown below.

//arrayed version
var k:int = myArr.length;
for (var i:int = 0; i < k - 1; i++)
{
    oi = myArr[i];
    for (var j:int = i + 1; j < k; j++)
    {
        oj = myArr[j];
        collide(oi, oj);
    }
}
//linked version
var node:SListNode = head;
while (node)
{
    //check node against all following nodes
    var walker:SListNode = node.next;
    while (walker)
    {
        collision(node, walker);
        walker = walker.next;
    }
    //repeat with next node
    node = node.next;
}

Iterating over a linked list

Here are the basic ways to iterate over a list, starting with the ‘cleanest’ way using the iterator pattern. It decouples the iteration from the implementation resulting in a common interface all data structures in the package support.

//get iterator instance
var itr:DListIterator = new DListIterator(list);

//or simply
itr = list.getListIterator();

//start iteration
for (itr.start(); itr.valid(); itr.forth())
{
    trace(itr.node, itr.data);
}

This can also be written using a while loop:

//restart
itr.start();

while (itr.valid())
{
    tr(itr.node, itr.data);
    itr.forth();
}

Both iteration styles are fine in most cases, but when performance is critical you should definitely avoid using the iterator interface. Although it offers a unified way of accessing data, the iterator pattern needs many function calls. Therefore we modify the second iteration method and change it slightly to arrive at a very lightweight iteration method:

//take the first node
var walker:DListNode = list.head;
while (walker)
{
    trace(walker, walker.data);
    walker = walker.next;
}

The code above just uses the node’s pointer to the next node to successively jump from node to node until a dead end is reached, and that happens when the tail node is referenced, which obviously has no valid next node.

Performance & optimization

Comparing the latter while loop with a standard for loop yields that it’s slightly slower, but there isn’t really a big difference between the loop bodies. Keep in mind that it’s more important what you are actually doing inside the loop body than purely looking at the loop code itself, so the numbers won’t have a measurable impact on performance. Anyway here are some numbers:

size for (i = 0; i < k; i++) while(i) {i = i.next}
1e+7 30 ms 164 ms
1e+6 5 ms 17 ms
1e+5 - 1 ms
Figure 5: Array vs linked list loop body comparison.

But let’s move from empty loop bodies and try to access the data, which is much more interesting:

var val:*
//arrayed version, [] access
for (var i:int = 0; i < 1e+7; i++)
{
    val = myArray[i];
}
//linked version, '.' access
while (walker)
{
    val = walker.data;
    walker = walker.next;
}

Quick note: leaving val untyped is surprisingly a lot faster than strongly typed.

size array access [] object access .
1e+7 131 ms 131 ms
1e+6 13 ms 13 ms
1e+5 1 ms 1 ms
Figure 6: Array vs linked list loop data access comparison.

So, no difference here, meaning that linked lists can be a perfect replacement for arrays in terms of data access speed, at the same time offering superior performance when modifying big data sets.

Instead of using a list manager like the DLinkedList class, you can implement a custom ‘low-level’ linked list if you want to get the most out of the AVM. All you need is a basic node class, capable of storing data and and a reference to other nodes. But be aware that if you start playing around with references, you should assure that every reference is removed properly if unused, otherwise you quickly run into hairy problems resulting in strange behavior and memory waste/leak…

That should be all you need to know about linked lists. As usual you can download the examples for this article here.

See also:
The graph class
The queue class
The tree class

Next Page »

Proudly powered by WordPress Theme based upon Pool theme by Borja Fernandez.
Entries and comments feeds.