A little alchemy in hx3ds

haXe provides an easy API for a bunch of special byte code instructions which were silently included in the flash player 10 release and are mainly used by the alchemy toolkit. All the functionality is found inside the flash.Memory class. Here is a basic example on how to use the Virtual Memory API:

var bytes:ByteArray = new ByteArray();
bytes.length = 1024;
flash.Memory.select(bytes);

flash.Memory.setDouble(0, 1.0);
flash.Memory.setDouble(7, 2.0);

flash.Memory.getDouble(0);
flash.Memory.getDouble(7);

Memory.select() puts the byte array into “virtual memory” so we can access it with the Memory class. Since in this case we are writing double-precision IEEE 64bit floats (aka the Number type in ActionScript) we need to offset the next position by 8 bytes.

Byte array on steroids

Reading and writing values through the Memory class is very quick. Strangely it wasn’t possible to create predictable benchmarks – the results of all my tests greatly varied across different machines. I don’t really know what’s going on behind the scenes (maybe the JIT does some magic?) but the more Memory operations are grouped together, the bigger the difference becomes. I wrote two simple benchmarks to get some numbers; Benchmark1.hx does a single read/write operation per iteration, whereas Benchmark2.hx does the same operation ten times per iteration:

benchmark

Benchmark1.hx: read/write operations relative to array access, higher is better

benchmark

Benchmark2.hx: read/write operations relative to array access, higher is better

Applying synthetic benchmarks to real world applications can be misleading, so I’ve also run some test over my data structures. The chart below shows the outcome for a bit vector. Other structures like queues, stacks or 2D-Arrays behave about the same. Another nice speed boost if you are dealing with numbers :-)

BitVector as3ds vs hx3ds

BitVector speed relative to as3ds version, higher is better

A reusable solution

The question however is how to share data across the application. The obvious way is to store your data in separate byte arrays an then call Memory.select(yourByteArray) just before accessing that data. Unfortunately this is too slow to be useful. The alternative is to allocate a big chunk of memory when your application starts and then forward all your operations to some kind of manager class. It should be responsible for allocating empty space and freeing up used space once it’s no longer needed (so you don’t run out of memory). I have implemented those ideas in a MemoryManager class, which is part of the de.polygonal.ds.mem package:

MemoryManager.allocate(4, 1);

var bitMemory:BitMemory      = MemoryManager.getBitMemory(100);
var byteArray:ByteMemory     = MemoryManager.getByteMemory(100);
var intArray:IntMemory       = MemoryManager.getIntMemory(100);
var floatArray:FloatMemory   = MemoryManager.getFloatMemory(100);
var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100);

The first line allocates 4 KiB of total memory and 1 KiB of special “raw” memory (e.g. useful for doing math tricks like the fast inverse square root Nicolas demonstrated some time ago). The next few lines create number arrays capable of storing bits, bytes, integers, floats and doubles respectively. Reading and writing values is now very straightforward:

var intArray:IntMemory = MemoryManager.getIntMemory(100);
var value:Int = intArray.get(i);
intArray.set(i, val);

This adds some overhead because each array needs to maintain an offset address and also scale the position by the size of the data type. So we need an additional add and bit shift operation. This is how the index is computed for an “integer memory array”:

memoryIndex = memoryOffset + (integerIndex << 2);

An AS3 integer is a double word so we multiply the position by 4 (8 bytes * 4 = 32bits) and add the offset.

MemoryManager internals

Individual chunks of memory are represented by a linked list of intervals which can either be empty or full. An interval is represented by a MemoryArea object that just stores it’s start and end byte inside the byte array. Empty intervals are pushed to the left (the head of the list) so finding empty space is done as fast as possible while full intervals are pushed to the right. This is how the memory areas look like after allocating the 5 memory arrays from the top example:

memory internals

blue: empty space, magenta: used space

In order to free up used space every memory array has a purge() method. Frequent allocation and deallocation will lead to fragmentation (the same basically happens to your hard drive) so the MemoryManager class has a defragment() method that will clean up the mess. Defragmentation is done automatically in case the manager isn’t able to find a sufficient big amount of continuous space for a get[Type]Memory() request. If there still isn’t enough space available after defragmentation the class will throw an error indicating that you should raise the amount of bytes for your application.