MemoryManager revised

Starting with DS version 1.1, there is now a new MemoryManager available, which was almost written from scratch. My past experience showed that the former implementation was somewhat limited, as you were forced to preallocate a fixed amount of memory prior to your application’s entry point. The new version now supports dynamic memory allocation, so memory consumption grows with application demand (you can still define an upper limit to be on the safe side). It’s now also possible to resize memory space after it has been allocated. I’ve also simplified the API and optimized the code.

Using alchemy memory in HaXe is now as simple as writing:

var integerArray = new IntMemory(x);
integerArray.free();

The first line allocates space for storing x 32-bit integers, and the second line will free up used memory once it’s no longer needed. The same also applies to BitMemory (a bit vector), ShortMemory (16-bit integers), FloatMemory (32-bit IEEE 754 single-precision floats) and DoubleMemory (64-bit IEEE 754 double-precision floats). Each implementation provides a get/set method to read/write the value from/to index i (I hope HaXe gets support for overriding [] in the future so we can omit the get()/set() methods); an offset property that stores the memory offset in bytes and a bytes property indicating the number of allocated bytes. Custom classes can be realized by inheriting from the abstract MemoryAccess class.

memory manager demo A visual representation of the MemoryManager (roll over). Randomly allocates and deallocates memory while calling pack() at regular intervals. Colored blocks: allocated space; Black: empty space; White lines: block size.

Frequent allocation/deallocation leads to fragmentation so the user can request defragmentation by calling MemoryManager.defrag(). In addition to the defrag() method, MemoryManager.pack() frees up unused space so it can be garbage collected. To be exact: All existing bytes are copied from the current ByteArray accessed by the alchemy memory opcodes to a smaller ByteArray, and once the reference to the former ByteArray is GC’ed the former content is deallocated (setting the .length property of the ByteArray seems to be a bit flaky and sometimes results in a runtime exception).

A quick note about MemoryManager.BLOCK_SIZE_BYTES: This value defines the growth-rate of the memory. The default is 1024 bytes (must be a power of 2). Every time the MemoryManager runs out of memory, a multiple of this block size is added. The block size affects performance and storage efficiency. As a rule of thumb the block size should match the average size of all ‘memory arrays’ used in the application. Using a tiny block size when storing 2k images is obviously a bad idea.

A major advantage of using alchemy memory is that you can use the appropriate data size that fits your needs which brings down memory usage. For example if you know in advance that your numbers are in the range 0…100 you can use bytes, or if you need floats without full 64-bit precision, you can fall back to 32-bit floats. In AS3, you are constrained to 32-bit integers or 64-bit double precision floats. Using the ByteArray is not an option because it’s too slow.

Performance

Although the preview version of FP 10.1 slowed down memory access a bit (it seems that the latest release fixes most performance issues) using those special alchemy memory opcodes is by far the best way to work with numbers. Time for some real world examples!

Example 1: JPEG encoding

After porting the JPEG encoder from the Flex framework (mx.graphics.codec) to HaXe I’ve encoded an empty 1024×786 bitmap and compared the numbers:

The HaXe version encodes the image in about 100 milliseconds, the AS3 version takes almost a second.

Example 2: Bit vectors

My next test compares an AS3 bit vector implementation from generalrelativity.org with the BitMemory from the ds package:

Combining alchemy memory, optimized byte code and inlining gives some excellent results :)

Example 3: FP10 drawing API

The last test draws triangles with the FP10 drawing API and measures the time needed to build a GraphicsPath object from a command and a data vector (without calling graphics.drawGraphicsData()). The HaXe version uses my VectorRenderer class which utilizes alchemy memory as a temporary buffer.

All results are based on the windows release player 10,0,42,34 and use the MemoryManager class.

A little alchemy in hx3ds

haXe provides an easy API for a bunch of special byte code instructions which were silently included in the flash player 10 release and are mainly used by the alchemy toolkit. All the functionality is found inside the flash.Memory class. Here is a basic example on how to use the Virtual Memory API:

var bytes:ByteArray = new ByteArray();
bytes.length = 1024;
flash.Memory.select(bytes);

flash.Memory.setDouble(0, 1.0);
flash.Memory.setDouble(7, 2.0);

flash.Memory.getDouble(0);
flash.Memory.getDouble(7);

Memory.select() puts the byte array into “virtual memory” so we can access it with the Memory class. Since in this case we are writing double-precision IEEE 64bit floats (aka the Number type in ActionScript) we need to offset the next position by 8 bytes.

Byte array on steroids

Reading and writing values through the Memory class is very quick. Strangely it wasn’t possible to create predictable benchmarks – the results of all my tests greatly varied across different machines. I don’t really know what’s going on behind the scenes (maybe the JIT does some magic?) but the more Memory operations are grouped together, the bigger the difference becomes. I wrote two simple benchmarks to get some numbers; Benchmark1.hx does a single read/write operation per iteration, whereas Benchmark2.hx does the same operation ten times per iteration:

benchmark

Benchmark1.hx: read/write operations relative to array access, higher is better

benchmark

Benchmark2.hx: read/write operations relative to array access, higher is better

Applying synthetic benchmarks to real world applications can be misleading, so I’ve also run some test over my data structures. The chart below shows the outcome for a bit vector. Other structures like queues, stacks or 2D-Arrays behave about the same. Another nice speed boost if you are dealing with numbers :-)

BitVector as3ds vs hx3ds

BitVector speed relative to as3ds version, higher is better

A reusable solution

The question however is how to share data across the application. The obvious way is to store your data in separate byte arrays an then call Memory.select(yourByteArray) just before accessing that data. Unfortunately this is too slow to be useful. The alternative is to allocate a big chunk of memory when your application starts and then forward all your operations to some kind of manager class. It should be responsible for allocating empty space and freeing up used space once it’s no longer needed (so you don’t run out of memory). I have implemented those ideas in a MemoryManager class, which is part of the de.polygonal.ds.mem package:

MemoryManager.allocate(4, 1);

var bitMemory:BitMemory      = MemoryManager.getBitMemory(100);
var byteArray:ByteMemory     = MemoryManager.getByteMemory(100);
var intArray:IntMemory       = MemoryManager.getIntMemory(100);
var floatArray:FloatMemory   = MemoryManager.getFloatMemory(100);
var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100);

The first line allocates 4 KiB of total memory and 1 KiB of special “raw” memory (e.g. useful for doing math tricks like the fast inverse square root Nicolas demonstrated some time ago). The next few lines create number arrays capable of storing bits, bytes, integers, floats and doubles respectively. Reading and writing values is now very straightforward:

var intArray:IntMemory = MemoryManager.getIntMemory(100);
var value:Int = intArray.get(i);
intArray.set(i, val);

This adds some overhead because each array needs to maintain an offset address and also scale the position by the size of the data type. So we need an additional add and bit shift operation. This is how the index is computed for an “integer memory array”:

memoryIndex = memoryOffset + (integerIndex << 2);

An AS3 integer is a double word so we multiply the position by 4 (8 bytes * 4 = 32bits) and add the offset.

MemoryManager internals

Individual chunks of memory are represented by a linked list of intervals which can either be empty or full. An interval is represented by a MemoryArea object that just stores it's start and end byte inside the byte array. Empty intervals are pushed to the left (the head of the list) so finding empty space is done as fast as possible while full intervals are pushed to the right. This is how the memory areas look like after allocating the 5 memory arrays from the top example:

memory internals

blue: empty space, magenta: used space

In order to free up used space every memory array has a purge() method. Frequent allocation and deallocation will lead to fragmentation (the same basically happens to your hard drive) so the MemoryManager class has a defragment() method that will clean up the mess. Defragmentation is done automatically in case the manager isn't able to find a sufficient big amount of continuous space for a get[Type]Memory() request. If there still isn't enough space available after defragmentation the class will throw an error indicating that you should raise the amount of bytes for your application.