April 24, 2009 on 3:35 pm | In Actionscript, Links, Physics | 9 Comments
The finite element method is a very versatile method for any kind of boundary value problem. It was first extensively used in engineering as a tool for structural analysis, but it can also be used for heat flow, fluid and electrodynamical simulation to name some fields.
The underlying principle (as the name implies) is the discretization of the domain of interest into simple connected elements like triangles, rectangles, tetrahedrons or dozens more possible types. Then the physical property (force, velocity, temperature…) is evaluated at the nodes, or vertices if you look at an isolated element, of the resulting mesh. Between vertices, for example inside the volume of a tetrahedron the variable is interpolated.
Now in structural dynamics the elements represent actual physical elements with a mass and a volume (or area in 2D). In comparison to rigid bodies, usually more than one element makes up a body. Consider for example a box:

It can be described by two triangular finite elements, or four smaller rectangular elements, or any other mesh covering the box. As a rigid body the box would be described as a … box. So why use these elements if a ‘one element’ formulation is sufficient? Because finite elements offer a number of physical properties which just cannot be described efficiently by a rigid body. For a real time simulation the most important properties are elastic behavior, permanent deformations and breakability.
But how are these elements actually formulated? To illustrate how finite elements are described consider a system of two masses connected by a spring.
A spring element with nodes 1 and 2
This is a two node one dimensional element.
The spring’s elongation is linearly proportional to the applied force:
where δ ist the elongation, k is the spring constant which describes the spring’s resiliance to deformations and u stands for the displacement from the initial positions of the masses. In equilibrium the two forces at the nodes are opposite to each other.
The forces can be expressed by the previously found relation:
And these two formulas can be written as a matrix equation:
The matrix is called the stiffness matrix K and it contains the material and geometric characteristics of the element. In this simple case it is just a 2×2 matrix but the more complicated an element becomes the size of the matrix can easily reach 12×12 or even bigger sizes.
After finding the description for one lonely spring the next step is to formulate the connection between several elements. So another spring is connected at the right node of the first spring. It has the same stiffness matrix K and subsequently a similar matrix equation, but with a new force and displacement variable for the third node.
The two stiffness matrices must be combined to form a global stiffness matrix. For this simple example it looks like this:
Where the upper index denotes the element the force stems from and the lower index the node number.
This basically describes the whole procedure behind a finite element formulation: the stiffness matrix is formulated, the mesh generated and the global stiffness matrix is computed. These steps are all done before the actual simulation in which the matrix equation is solved for u, the global displacements. This is done by calculating the inverse of K,
which depending on the size of K can be an expensive computation and generally is one reason (the other being high memory requirements) why the finite element method is rarely used for real time physics engines which should be as fast as possible.
But none the less for few elements it is still fast enough to be useful. Hence after all this introduction it is time to present an implementation. The following demo is a dynamical simulation of a bar consisting of 16 triangular elements.
press space to start simulation
This seems at first sight fun to look at but useless, however this finite element method should be integratable into motor2, which would give the best of both worlds. A fast physics engine for rigid bodies and the possibility to simulate deformable (and possibly breakable) objects.
Besides, this is a perfect candidate for using PixelBender since it involves heavy math crunching on big matrices, which doesn’t make sense for the current Box2D iterative impulse solver. Meanwhile Michael is working hard to release the next motor2 version soon, and once it’s done we hope to expand motor2 with finite element capabilities.
March 14, 2009 on 11:27 pm | In Actionscript, data structures | 15 Comments
haXe provides an easy API for a bunch of special byte code instructions which were silently included in the flash player 10 release and are mainly used by the alchemy toolkit. All the functionality is found inside the flash.Memory class. Here is a basic example on how to use the Virtual Memory API:
var bytes:ByteArray = new ByteArray();
bytes.length = 1024;
flash.Memory.select(bytes);
flash.Memory.setDouble(0, 1.0);
flash.Memory.setDouble(7, 2.0);
flash.Memory.getDouble(0);
flash.Memory.getDouble(7);
Memory.select() puts the byte array into “virtual memory” so we can access it with the Memory class. Since in this case we are writing double-precision IEEE 64bit floats (aka the Number type in ActionScript) we need to offset the next position by 8 bytes.
Byte array on steroids
Reading and writing values through the Memory class is very quick. Strangely it wasn’t possible to create predictable benchmarks - the results of all my tests greatly varied across different machines. I don’t really know what’s going on behind the scenes (maybe the JIT does some magic?) but the more Memory operations are grouped together, the bigger the difference becomes. I wrote two simple benchmarks to get some numbers; Benchmark1.hx does a single read/write operation per iteration, whereas Benchmark2.hx does the same operation ten times per iteration:
Benchmark1.hx: read/write operations relative to array access, higher is better
Benchmark2.hx: read/write operations relative to array access, higher is better
Applying synthetic benchmarks to real world applications can be misleading, so I’ve also run some test over my data structures. The chart below shows the outcome for a bit vector. Other structures like queues, stacks or 2D-Arrays behave about the same. Another nice speed boost if you are dealing with numbers :-)
BitVector speed relative to as3ds version, higher is better
A reusable solution
The question however is how to share data across the application. The obvious way is to store your data in separate byte arrays an then call Memory.select(yourByteArray) just before accessing that data. Unfortunately this is too slow to be useful. The alternative is to allocate a big chunk of memory when your application starts and then forward all your operations to some kind of manager class. It should be responsible for allocating empty space and freeing up used space once it’s no longer needed (so you don’t run out of memory). I have implemented those ideas in a MemoryManager class, which is part of the de.polygonal.ds.mem package:
MemoryManager.allocate(4, 1);
var bitMemory:BitMemory = MemoryManager.getBitMemory(100);
var byteArray:ByteMemory = MemoryManager.getByteMemory(100);
var intArray:IntMemory = MemoryManager.getIntMemory(100);
var floatArray:FloatMemory = MemoryManager.getFloatMemory(100);
var doubleArray:DoubleMemory = MemoryManager.getDoubleMemory(100);
The first line allocates 4 KiB of total memory and 1 KiB of special “raw” memory (e.g. useful for doing math tricks like the fast inverse square root Nicolas demonstrated some time ago). The next few lines create number arrays capable of storing bits, bytes, integers, floats and doubles respectively. Reading and writing values is now very straightforward:
var intArray:IntMemory = MemoryManager.getIntMemory(100);
var value:Int = intArray.get(i);
intArray.set(i, val);
This adds some overhead because each array needs to maintain an offset address and also scale the position by the size of the data type. So we need an additional add and bit shift operation. This is how the index is computed for an “integer memory array”:
memoryIndex = memoryOffset + (integerIndex << 2);
An AS3 integer is a double word so we multiply the position by 4 (8 bytes * 4 = 32bits) and add the offset.
MemoryManager internals
Individual chunks of memory are represented by a linked list of intervals which can either be empty or full. An interval is represented by a MemoryArea object that just stores it’s start and end byte inside the byte array. Empty intervals are pushed to the left (the head of the list) so finding empty space is done as fast as possible while full intervals are pushed to the right. This is how the memory areas look like after allocating the 5 memory arrays from the top example:
blue: empty space, magenta: used space
In order to free up used space every memory array has a purge() method. Frequent allocation and deallocation will lead to fragmentation (the same basically happens to your hard drive) so the MemoryManager class has a defragment() method that will clean up the mess. Defragmentation is done automatically in case the manager isn’t able to find a sufficient big amount of continuous space for a get[Type]Memory() request. If there still isn’t enough space available after defragmentation the class will throw an error indicating that you should raise the amount of bytes for your application.
March 12, 2009 on 1:11 am | In Actionscript, data structures | 7 Comments
As the name suggests, hx3ds is a port of as3ds for haXe and is now available at lib.haxe.org. hx3ds only supports the flash player 10 target, as it makes extensive use of the Vector class. If you need data structures that compile across all platforms, take a look at colhx instead.
Here’s a list of new features:
- orders of magnitude faster
- collections now support clone() and shuffle() operations
- object pooling framework
- revised graph, tree and linked list classes
- memory manager for the virtual memory API (more on this soon)
I’m planning to port the new features back to as3ds as soon as it leaves beta.
The best thing is that hx3ds runs considerable faster than as3ds. This is good news since data structures are usually used at a low level in an application and so the speed increase should be noticeable instantly. Here are some numbers for the two dimensional array:
relative speed of hx3ds compared as3ds, higher is better
This difference stems mainly from haXe using proper integer opcodes, method inlining and generics. Let me shortly explain the last point: Say you want to write a custom List class and store the data within a Vector using composition. In ActionScript this would look like this:
package {
public class MyList {
private var _data:Vector.<Object>;
...
}
}
You are forced to use a dynamic Vector because you don’t know what the users will throw in. There’s nothing wrong with it, but you can’t expect any performance wonders. As usual, some hard facts:
combined read/write execution speed relative to an array, higher is better
In haXe the former code example would look like this:
class MyList<T> implements haxe.rtti.Generic {
private var _data:flash.Vector<T>;
....
}
...
var list:MyList<Int> = new MyList<Int>();
By implementing the Generic interface you tell the compiler to create unique classes for each type parameter, gaining both type safety and speed. After compilation the result would look like this:
class MyList_Int implements haxe.rtti.Generic {
private var _data:flash.Vector<Int>;
....
}
March 9, 2009 on 3:53 pm | In Actionscript, Miscellaneous | 4 Comments
haXe is fantastic! The language has some very nice features like type inference which I don’t want to miss anymore. Even better, the compiler is lightning fast and will provide you almost instantly with compiler errors, so fixing your code becomes a short and sweet procedure. There are also libraries available for writing ABC byte code or PixelBender assembler directly in haXe. Isn’t that nice ?
The only problem right now is the lack of a full blown IDE for haXe. I love FlashDevelop and use it every day. Unfortunately the auto completion for haXe in FD3 is somewhat in an experimental stage and goes crazy for classes with type parameters. Hopefully this will be improved in a future release. But otherwise the work flow is very nice and I’m currently porting my open-source libraries to haXe.
January 21, 2009 on 12:45 am | In Actionscript | 11 Comments
Just a quick tip to get the numbers right when testing the execution time of your ActionScript code; you only get correct results if you follow two steps:
-
Compile your code with the debug flag set to false (mxmlc: -debug=false). Otherwise your swf gets bloated with byte code used by the debug player.
-
Always test your swf with the release player. Most developers have set the debug player as the default one, but it doesn’t reflect real world performance because it can be much slower. If the context menu has the option “show redraw regions” you are running the debug player.
If you are not aware of this you might unintentionally spent much time in implementing some clever optimizations which appear to run faster in the debug player, but perform worse in the release player. Here is an example for the sin/cos approximation:
| Debug player |
Release player |
| Math.sin() + Math.cos() |
340 ms |
Math.sin() + Math.cos() |
142 ms |
| inline sin + cos approximation (high precision) |
440 ms |
inline sin + cos approximation (high precision) |
11 ms |
« Previous Page —
Next Page »