A little alchemy in hx3ds

haXe provides an easy API for a bunch of special byte code instructions which were silently included in the flash player 10 release and are mainly used by the alchemy toolkit. All the functionality is found inside the flash.Memory class. Here is a basic example on how to use the Virtual Memory API:

var bytes:ByteArray = new ByteArray();
bytes.length = 1024;
flash.Memory.select(bytes);

flash.Memory.setDouble(0, 1.0);
flash.Memory.setDouble(7, 2.0);

flash.Memory.getDouble(0);
flash.Memory.getDouble(7);

Memory.select() puts the byte array into “virtual memory” so we can access it with the Memory class. Since in this case we are writing double-precision IEEE 64bit floats (aka the Number type in ActionScript) we need to offset the next position by 8 bytes.

Byte array on steroids

Reading and writing values through the Memory class is very quick. Strangely it wasn’t possible to create predictable benchmarks – the results of all my tests greatly varied across different machines. I don’t really know what’s going on behind the scenes (maybe the JIT does some magic?) but the more Memory operations are grouped together, the bigger the difference becomes. I wrote two simple benchmarks to get some numbers; Benchmark1.hx does a single read/write operation per iteration, whereas Benchmark2.hx does the same operation ten times per iteration:

benchmark

Benchmark1.hx: read/write operations relative to array access, higher is better

benchmark

Benchmark2.hx: read/write operations relative to array access, higher is better

Applying synthetic benchmarks to real world applications can be misleading, so I’ve also run some test over my data structures. The chart below shows the outcome for a bit vector. Other structures like queues, stacks or 2D-Arrays behave about the same. Another nice speed boost if you are dealing with numbers :-)

BitVector as3ds vs hx3ds

BitVector speed relative to as3ds version, higher is better

A reusable solution

The question however is how to share data across the application. The obvious way is to store your data in separate byte arrays an then call Memory.select(yourByteArray) just before accessing that data. Unfortunately this is too slow to be useful. The alternative is to allocate a big chunk of memory when your application starts and then forward all your operations to some kind of manager class. It should be responsible for allocating empty space and freeing up used space once it’s no longer needed (so you don’t run out of memory). I have implemented those ideas in a MemoryManager class, which is part of the de.polygonal.ds.mem package:

MemoryManager.allocate(4, 1);

var bitMemory:BitMemory      = MemoryManager.getBitMemory(100);
var byteArray:ByteMemory     = MemoryManager.getByteMemory(100);
var intArray:IntMemory       = MemoryManager.getIntMemory(100);
var floatArray:FloatMemory   = MemoryManager.getFloatMemory(100);
var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100);

The first line allocates 4 KiB of total memory and 1 KiB of special “raw” memory (e.g. useful for doing math tricks like the fast inverse square root Nicolas demonstrated some time ago). The next few lines create number arrays capable of storing bits, bytes, integers, floats and doubles respectively. Reading and writing values is now very straightforward:

var intArray:IntMemory = MemoryManager.getIntMemory(100);
var value:Int = intArray.get(i);
intArray.set(i, val);

This adds some overhead because each array needs to maintain an offset address and also scale the position by the size of the data type. So we need an additional add and bit shift operation. This is how the index is computed for an “integer memory array”:

memoryIndex = memoryOffset + (integerIndex << 2);

An AS3 integer is a double word so we multiply the position by 4 (8 bytes * 4 = 32bits) and add the offset.

MemoryManager internals

Individual chunks of memory are represented by a linked list of intervals which can either be empty or full. An interval is represented by a MemoryArea object that just stores it’s start and end byte inside the byte array. Empty intervals are pushed to the left (the head of the list) so finding empty space is done as fast as possible while full intervals are pushed to the right. This is how the memory areas look like after allocating the 5 memory arrays from the top example:

memory internals

blue: empty space, magenta: used space

In order to free up used space every memory array has a purge() method. Frequent allocation and deallocation will lead to fragmentation (the same basically happens to your hard drive) so the MemoryManager class has a defragment() method that will clean up the mess. Defragmentation is done automatically in case the manager isn’t able to find a sufficient big amount of continuous space for a get[Type]Memory() request. If there still isn’t enough space available after defragmentation the class will throw an error indicating that you should raise the amount of bytes for your application.

18 Comments

  1. Hi,

    when you call flash.Memory.select(bytes) (which is mapped quite directly to something like ApplicationDomain.currentDomain. domainMemory = bytes), the FPL maps the ByteArray directly to a chunk of RAM. That way accessing the data becomes extremely fast using the new OPCodes. That might be the cause why you are getting extremely different benchmarks on different machines.
    Take into account also that the FPL performs JIT based on many different conditions, and so sometimes your code might now be JITed.

  2. mistyped sorry: might *not* be JITed

  3. Hi there, good article. But recently I ran a test to modify BitmapData pixel (by converting it to bytearray/Vector and back). And I notice that performance wise the as3 version is faster than the haxe version when using Vector. And Haxe Vector is faster than Alchemy.

  4. thanks gabriele, that makes sense. at least in my tests the Memory version was always faster than a Vector (tested only windows version with release player & -debug=false)

  5. I had the same performance improvements on mac too. Vector is faster then Array, but a lot slower then accessing directly the memory because of bounds checks and other stuff that make the Vectory data structure “safe”.

    Did you try to implement your data structures directly using C/C++ and compiling them using Alchemy ? Thanks to the tatically compiled generated code, you *may* obtain big performance improvements. I say *may* because it doesn’t always work – unfortunately ;)

  6. How come testRawDoubleMemory() is slower than testMemoryManagerDoubleMemory() for me? This doesnt make sense, the memory manager ultimately calls the same Memory.get/setDouble() but with more function call overheads, and a getIndex() lookup.

    Besides this mystery I like what you’ve done here. Thanks.

  7. the difference is very small (about 3-5%), and since the flash timer isn’t very accurate the results are very close. Also, there are no function calls since the methods are all inlined. So the only difference is computing the index, which is also very simple.

  8. I wonder if this functionality could be wrapped into swc and used from normal flex project. There would be a small performance hit as you need wrapper methods to create the arrays, but would it work?

    1. I managed to compile the complete hx3ds package into a swc file (to be released soon) and everything works fine. The problem is that as3 does not support function inlining, which makes the Memory classes much slower than a Vector because every access to the memory needs a function call.

  9. Hi! Whenever I try to use a HashMap I get this error:

    1044: Interface method get size in namespace de.polygonal.ds:Collection not implemented by class de.polygonal.ds:HashMap.

    Can you tell me why? Thanks!

    Kyle

    1. I can’t reproduce this with latest swc version:
      var hashmap:HashMap = new HashMap();
      var size:int = hashmap.size();
      trace(size); //0

  10. I found out the problem. You can’t have internal classes declared in the same file as other classes or something like that. I think it was a CS4 thing. So I had to take the HashMapIterator and the other one out of the HashMap class and create their on classes.

  11. LordDelta

    Unfortunately, I feel these tests may be a bit out-dated.

    Although I couldn’t even run the entire test due to a script-time-out error, the Vector test actually was a couple microseconds faster than the DoubleMemory test….I’m guessing that adobe has since streamlined their Vector classes, though I might be missing something?

    1. which player version/system do you use? in 10.1 beta memory access is a little slower, but it still outperforms a vector. I’ll hope they fix that for the final release.

  12. LordDelta

    Might have been an older debug player, since I tried updating the debug player and things started working in DoubleMemory’s favor. Actually I still got one time where the DoubleMemory was slower and the Vector was faster… may be something different about how Windows 7 (my plaftorm) handles the memory, or even just issues with memory flash is using being paged in/out, since I saw result vary in times, for both Vector and DoubleMemory. The results were still not differing by a factor of 2-3x, which is what one is led to believe from this article. I’ll probably have to play around with the example a bit more to see if I can’t get haxe to behave a bit better; it could also be the fact that I’m using FlashDevelop to test your file. Thanks for the suggestion, anyhow!

Trackbacks/Pingbacks

  1. [...] décidé d’utiliser le MemoryManager du projet hx3ds la version haXe du projet AS3DS API de structure de données fortement [...]

  2. [...] HaXe is cool not only for platform compatibility with things like the iPhone (and who knows – maybe other platforms like Android or Windows Mobile), but also for its language features. They deployed support for the new Flash 10 high-performance memory intrinsics before the official Adobe compiler. HaXe has generics/templates, decent conditional compilation, inlining, advanced typing a lot more. The biggest claim to fame HaXe has is a big performance margin over AS3, as well as better memory management options. [...]

Get Adobe Flash player