Dynamical Simulation with Finite Elements

April 24, 2009 on 3:35 pm | In Actionscript, Links, Physics | 9 Comments

The finite element method is a very versatile method for any kind of boundary value problem. It was first extensively used in engineering as a tool for structural analysis, but it can also be used for heat flow, fluid and electrodynamical simulation to name some fields.

The underlying principle (as the name implies) is the discretization of the domain of interest into simple connected elements like triangles, rectangles, tetrahedrons or dozens more possible types. Then the physical property (force, velocity, temperature…) is evaluated at the nodes, or vertices if you look at an isolated element, of the resulting mesh. Between vertices, for example inside the volume of a tetrahedron the variable is interpolated.

Now in structural dynamics the elements represent actual physical elements with a mass and a volume (or area in 2D). In comparison to rigid bodies, usually more than one element makes up a body. Consider for example a box:
types
It can be described by two triangular finite elements, or four smaller rectangular elements, or any other mesh covering the box. As a rigid body the box would be described as a … box. So why use these elements if a ‘one element’ formulation is sufficient? Because finite elements offer a number of physical properties which just cannot be described efficiently by a rigid body. For a real time simulation the most important properties are elastic behavior, permanent deformations and breakability.
But how are these elements actually formulated? To illustrate how finite elements are described consider a system of two masses connected by a spring.

types
A spring element with nodes 1 and 2

This is a two node one dimensional element.
The spring’s elongation is linearly proportional to the applied force:

types

where δ ist the elongation, k is the spring constant which describes the spring’s resiliance to deformations and u stands for the displacement from the initial positions of the masses. In equilibrium the two forces at the nodes are opposite to each other.

types

The forces can be expressed by the previously found relation:

types

And these two formulas can be written as a matrix equation:

types

The matrix is called the stiffness matrix K and it contains the material and geometric characteristics of the element. In this simple case it is just a 2×2 matrix but the more complicated an element becomes the size of the matrix can easily reach 12×12 or even bigger sizes.
After finding the description for one lonely spring the next step is to formulate the connection between several elements. So another spring is connected at the right node of the first spring. It has the same stiffness matrix K and subsequently a similar matrix equation, but with a new force and displacement variable for the third node.

types

The two stiffness matrices must be combined to form a global stiffness matrix. For this simple example it looks like this:

types

Where the upper index denotes the element the force stems from and the lower index the node number.
This basically describes the whole procedure behind a finite element formulation: the stiffness matrix is formulated, the mesh generated and the global stiffness matrix is computed. These steps are all done before the actual simulation in which the matrix equation is solved for u, the global displacements. This is done by calculating the inverse of K,

types

which depending on the size of K can be an expensive computation and generally is one reason (the other being high memory requirements) why the finite element method is rarely used for real time physics engines which should be as fast as possible.
But none the less for few elements it is still fast enough to be useful. Hence after all this introduction it is time to present an implementation. The following demo is a dynamical simulation of a bar consisting of 16 triangular elements.

press space to start simulation

This seems at first sight fun to look at but useless, however this finite element method should be integratable into motor2, which would give the best of both worlds. A fast physics engine for rigid bodies and the possibility to simulate deformable (and possibly breakable) objects.
Besides, this is a perfect candidate for using PixelBender since it involves heavy math crunching on big matrices, which doesn’t make sense for the current Box2D iterative impulse solver. Meanwhile Michael is working hard to release the next motor2 version soon, and once it’s done we hope to expand motor2 with finite element capabilities.

Feature-based geometrical algorithms

January 11, 2009 on 3:29 pm | In Actionscript, Physics | 11 Comments

Performing collision queries over a large number of objects is computationally expensive and therefore it’s important to use clever algorithms to avoid bottlenecks. After all potentially colliding object pairs have been sorted out during the broad phase, the remaining task is to run some algorithms to determine if and where the objects intersect. Of course there are tons of ways to do this.

motor2 mainly uses a SAT approach for this task, which is fast, easy to implement and quite robust. The only disadvantage is that you end up with an unpleasant quadratic-time complexity when shapes come into contact. In hope to improve this situation I spent some time learning different collision detection algorithms and stumbled across some papers talking about feature-based algorithms which like the name implies operate on the features (vertex, edge, face) of two polygons/polyhedra to determine if they are disjoint.
To say in advance: It’s really hard to find a better replacement for SAT in 2D because things are just so simple in two dimensions (this behaves differently in 3D, where SAT can be really slow for complex polyhedral shapes, since you need to check many more separation axes…).

The Lin-Canny algorithm [1] seems to be the most commonly used of these types of algorithms. The V-Clip, or Voronoi-Clip algorithm [2] is an enhancement of Lin-Canny and according to the author offers superior numerical robustness and can handle penetrations and degenerate situations as well.
I’m not quite sure if the well-known GJK algorithm (Gilbert-Johnson-Keerthi) falls under this category as well - from the geometrical point of view it’s simplex-based and looks at the minkowski difference.

Anyway, while I found many Lin-Canny implementations, V-Clip is a bit rare and after adopting the algorithm to 2D (fun!) I was wondering why so few people are using it until Erin Catto, the brain behind Box2D, pointed out that the algorithm is patented - I simply missed that (always read the small print!).

Here’s the result:

The V-CLIP algorithm in 2D, shaded areas represent voronoi regions of the nearest features, the solid line the minimum positive distance

Let me shortly explain how the algorithm works: The basic theorem states that if X and Y are a pair of features on two separated shapes, then if X is contained in the Voronoi Region of Y and Y is contained in the Voronoi Region of X, X and Y are the closest features and thus must contain the closest points between those features. Knowing the closest points, we can then compute the distance between them and declare a collision when the distance falls below some small value.

In a physics simulation we are usually integrating over a small time step so we can assume that objects only move by a small amount (called spatial and temporal coherence). This also means the new pair of closest features is probably near the last one from the previous time step, so an almost constant query time is achieved when re-starting the algorithm using the last result.

Talking about performance, V-CLIP is roughly 2-3 times faster than SAT and even better - in most situations it performs independently of the number of polygon vertices - which makes sense since you are only looking at the “difference” from the last state rather than the whole picture. Unluckily, the last problem I’m facing preventing it from being useful is that you have to do some extra work in creating a good contact manifold, so at the end it’s actually slower than SAT. But I have not yet given up and have some ideas to get the job done faster.

If you are interested in the code, it’s included in motor2 since version 0.9.
I have now looked more closely at the open Lin-Canny algorithm which is very similar to the Voronoi Clip algorithm, especially in 2D where I don’t need some extra cases introduced by the paper. For now I have dumped the V-Clip classes from my engine (they weren’t used anyway) and will revise the algorithm in the future.

There also other interesting applications, for example it allows you to compute the time of impact (TOI) for conservative advancement.

[1] A Fast Algorithm for Incremental Distance Calculation
[2] V-Clip: Fast and Robust Polyhedral Collision Detection

Collision detection for particle systems

July 13, 2008 on 11:16 pm | In Actionscript, Physics | 11 Comments

One of the things I’m working on for motor2 is the inclusion of a particle system, mainly for simulating bullets, fluids and soft bodies. The main challenge is to build a fast particle-polygon collision detection which would be able to compute the penetration depth and collision normal for a huge amount of particles. The containment check could be done with a binary search over the vertices of the polygon, an O(n log(n)) operation, but it doesn’t give you the information to push the particle outward.

After reading some papers I finally found the solution to my problem, namely a signed distance field.
A distance field is a scalar field that measures the distance from a given point to an object or data volume. Each element in a distance field specifies its minimum euclidean distance to the shape. Positive and negative distances are used to distinguish outside and inside of the shape (thus signed). This information is precomputed and stored with the shape.

Actually it’s very simple in 2D: First, the shape’s bounding box is divided into a regular grid. Second, each grid cell is analyzed - it can be either outside, inside or intersecting one or multiple edges. For the inside and outside case, a flag is stored with the cell. For the intersecting case, all piercing edges are found and converted into planes, stored in the point-normal form, n (X - P) = 0. At the end you are left with a very simple plane-point distance check.
A lower resolution requires less memory, but makes the lookup slower. The opposite is true for very small cells, because then each cell is likely to contain only one edge to test against. The demo below shows the ‘rasterized’ distance field:

In the next demo you see a ‘pseudo color representation’ of the distance field. As you see, no information is lost, and the original shape can be reconstructed by extracting a point-based contour at true Euclidean distance and any level of accuracy:

The last demo shows the minimum translation distance vector (MDT) obtained by evaluating the distance field. The vector, with its length and direction has the required information to push the particle outward if it’s contained by the polygon:

The steps are:

  1. Transform the particle to the polygon’s local space.
  2. Evaluate the distance field and resolve collision.
  3. Transform the new particle’s position back to world space.

4 dot products, 6 additions (step 1 + 3) and in the best case one array lookup and one dot product (step 2) are required. Of course, the method can fail if a particle moves very fast, in this case the particle’s movement should be modeled as a ray, which is then clipped against the polygon.

The source code for the distance field is not yet included in the motor2 svn, since it would mess up existing classes, but it will be very soon.

Using object pools

June 18, 2008 on 4:58 pm | In Actionscript, Physics | 37 Comments

Joa Ebert is right when he says that utilizing object pools can make your code perform a lot faster. An object pool is just a container for a bunch of pre-constructed objects that are kept in memory ready for use, rather than being repeatedly allocated and destroyed on demand.

Object pooling makes sense if:

  • you create dozens of short-lived objects in real-time applications like games
  • you need to store and share temporary data throughout complex algorithms
  • the objects are expensive to create (many fields, complex inheritance chain, nested objects)
  • the objects are expensive to remove (unregister listeners, nullify instances)

The only drawback is that memory consumption will raise, but with ridiculously low memory prices this shouldn’t be problem if used wisely ;-)

Implementation

So here is my ObjectPool.as manager class which is an attempt to create a lightweight, fast and reusable solution. The implementation is based on a circular list, and the API is very simple. Download: ObjectPool_v1.0.zip (source, example, asdoc)

EDIT
I have updated the class so it also accepts a factory for object construction.
Download: ObjectPool_v1.1.zip

First, we create the object pool:

var isDynamic:Boolean = true;
var size:int = 100;

var pool:ObjectPool = new ObjectPool(isDynamic);
pool.allocate(MyClass, size);

The isDynamic flag defines the behavior for an empty pool. If true, the pool automatically creates a new bunch of objects for you. If false, the class throws an Error to indicate that the pool is empty. The size value indicates the pool’s capacity - if the pool is dynamic, the pool grows by the initial size each time it becomes empty so it actually never dries up.

By calling the allocate method the pool is filled with 100 instances of MyClass. You can always reuse the pool for another Class by invoking this method again.

If you need to initialize the objects by passing arguments to it, you can do this with a little helper method called initialize:

pool.initialize("funcName", [arg0, arg1,...]);

This goes through every object and applies the function with the given arguments upon each object. This can also be done by reading each object, calling the function and putting it back:

for (var i:int = 0; i < pool.size; i++)
{
	var o:MyClass = pool.object;
	o.init(arg0, arg1, ...);
	pool.object = o;
}

Now to get an instance of MyClass you access the pool like this:

myObjectArray[i] = pool.instance;
//instead of
myObjectArray[i] = new MyClass();

When you are done with your object, instead of throwing it into the garbage collector, you “recycle” it for the next use:

pool.instance = myObjectArray[i];

This assumes that you are storing your instances in an array or something else because if you loose the reference, well it’s lost and can’t be reused anymore :-) And be careful not to assign the object twice, since then your pool would contain duplicates of the same object!

That’s all, pretty simple right ?

Finally, there is the purge() method, which is only interesting for pools that are dynamic. As the pool grows with the demands of the application, it can get quite big. The purge methods scans the pool and removes all allocated but currently unused objects, leaving with a compact representation.

Demo

Here is a little demo which demonstrations how the pool works internally. Actually it’s very simple. Pressing the RIGHT arrow key reads an object from the pool (first row), which is then stored in the second row beneath. Pressing the LEFT arrow key gives the object back to the pool. Pressing the ENTER key performs a purge() operation. The purple circle points to the node where the next object is read, the blue circle to an empty node where the insertion is performed.

Performance

Benchmarking revealed that it’s always faster to cache instances, even for the generic Object class. All benchmarks were done with the release player 9.0.124 on Vista, an object pool size of 100 and with 100 iterations each to get an average value.

The purple bar indicates the time needed to access the pool:

for (var i:int = 0; i < k; i++) instances[i] = p.instance;

The blue bar measures the task of reading the objects, then putting them back:

for (i = 0; i < k; i++) instances[i] = p.instance;
for (i = 0; i < k; i++) p.instance = instances[i];

The grey bar shows the time needed for creating the objects on the fly:

for (i = 0; i < k; i++) instances[i] = new MyClass();

Here my result:

Caching a native flash Object can be almost 5x faster instead of creating it on the fly.

Almost the same applies to a slightly more complex object like the flash.geom.Point class.

Creating complex objects, here from the flash.display.Sprite class, is extremely slow and up
to 80x faster when pooling.

Motor Physics released

December 31, 2007 on 11:59 pm | In Actionscript, Physics | 86 Comments
100 stacking boxes at 60Hz with 30 iterations

More examples:

200 stacked boxes
polygon soup
compound shapes

So here it is - due to lack of free time I have not managed to include all planned features, but I’m working hard to include those in the next update. For now I have removed many things and tried to clean up the source a little bit. Still some parts are very messy but will be fixed in the next days, so this is more a preview than a fully functional version. I also need to setup a SVN repository, write a TODO list etc. Currently you can get the source for the preview here. The engine’s name was changed to Motor2.

The new core was written according to Erin Catto’s brilliant Box2D source. Besides many things, it features a contact graph, compound objects, a working sleeping system, restitution.., all things which I have tried to implement but the Box2D version is better and it actually works :) I recommend everyone interested in learning how to write a physics engine to look at the source.

Performance

The biggest performance gains come from efficient high-level algorithms. Most algorithms in Motor run in O(n log(n)) to O(n) in the average case, and O(n^2) in the worst case only. It’s also very important that these algorithms have an efficient low-level implementation. For Flash, this means in most situations: the fewer instructions used, the faster the code executes. This is different for the Box2D C++ version. Here memory optimizations and cache-wise design are much more important. Following are my developing guidelines:

  • precompute as much as possible
  • avoid frequent object creation/deconstruction, instead create objects once and reuse them
  • constrain function calls, inline critical parts
  • inline vector and matrix math
  • use linked lists for data structures that are constantly under modification
  • limit array access (especially nested arrays)
  • accept code duplication instead of trying to encapsulate everything
  • avoid using flash’s build in methods, e.g. the Math class
  • prefer ‘extend and override’ instead of defining interfaces (template pattern)
  • don’t cast objects often

A special word about how to deal with vector and matrix math. I haven’t defined methods for vector operations at all. I have seen many vector classes, packed with dozens of methods for performing additions, multiplications etc. This makes your code incredible slow and ugly to read, resulting in spaghetti code when concatenating many operations. Without the possibility to inline functions and overload operators I recommend to just stick with writing the expressions component-wise. (an alternative would be to use a preprocessor, which I left out to make the code more accessible)

Collisions

Because the most time in a physics simulation is spend on detecting and resolving collisions, I tried to make them as efficient as possible. I have not ported Box2D’s collision code, instead I further enhanced my existing SAT collisions. Computing the support vertex of a polygon along a given direction (aka the separation axis) is now accelerated by a BSP tree-based structure for performing extremal queries on a convex polygon [1], a O(n log(n)) search. Compared to my previous brute-force method this is about 4-6x faster. Another very important change was to include separation axis hints in a contact object - if a separation axis was found, I start the search in the next time step with the previously found axis. So when for example the AABBs of two shapes are touching, but the shapes themselves do not, you get an almost instant collision resolving.
The overall result is that it’s now 40-70% faster than my latest ‘polygon soup‘ demo. This makes it possible to simulate more than 100 stacked boxes at +60fps on a decent machine.

Broad-Phase

Motor uses a quad tree as the default broad phase, based on articles in [2] and [3]. It copes very well with varying object sizes and scene dimensions so it should be fine for a wide range of applications. Erin’s Box2D uses Sweep&Prune, which is a very attractive solution making best use of spatial and temporal coherence. The implementation looks very impressive and complex - but since I don’t want to include something I don’t fully understand I started to write a pure linked Sweep&Prune detector on my own.

Creating Shapes

motor structure shape creation

I’ve adopted Box2D’s structure for creating bodies and shapes:
A ShapeData instance defines the geometry of the shape and is responsible for computing mass, moment of inertia and the centroid. A RigidBodyData objects contains a list of ShapeData objects and in addition defines the initial state (position, velocity) of the rigid body. The RigidBodyData object is then passed to a RigidBody object, which finally creates all collision shapes from this data.

Vector representation

Polygons are stored in a data structure I call a vector chain, which is nothing more than a doubly linked circular list of 2d vectors. In addition, each vector also stores its position index and a flag indicating if it’s the last vertex in the list. So it’s something between an array and a linked list. I created this format to handle wraparounds in both directions easily and fast. You could also create an arrayed doubly linked list lookup table of vertex indices like this:

var prev:Array = [];
var next:Array = [];
for (var i:int = 0; i < k; i++)
{
	prev[i] = i - 1;
	next[i] = i + 1;
}
prev[0] = k - 1;
next[n - 1] = 0;

Retrieving the next/prev vector in the circular list can be then done this way:

//next vertex in list
v = vertexList[next[i]];
v = vertexList[prev[i]];

But nothing is simpler and faster than:

v = v.next;
v = v.prev;

Also, since everyone tends to use either plain arrays, their own custom vector classes, or the build in point class I tried to stick with plain x and y properties.

Rendering

There are two different ways to show your objects on the screen. The first and most straight-forward way is to draw all objects in world-space coordinates into one DisplayObject (you have to invoke the toWorldSpace() method on each ShapeSkeleton instance to force a WCS transform before doing so). The alternative is to draw the shape’s modeling-space outline into individual DisplayObject instances and then just update their position and rotation to get in sync with the rigid bodies. This is faster and usually the best way to draw anything else besides plain vector graphics.

The package includes a simple renderer to demonstrate how it’s done, but since rendering is not handled by the physics engine you have to write something of your own.

Coordinate spaces

Collisions are all processed in the WCS (World Coordinate System). Normally you do this by transforming object B into the local space of object A, but in the 2D case it’s easier and faster to just transform every polygon into world space and process collisions there. To avoid unnecessary transformations, they are only computed if a client requests them for rendering or the collisions need them. For example, Box-Box collisions don’t need vertices, but the Polygon-Box detector does generate vertices for both shapes.

Works cited

[1] Eberly, David in Game Physics, Morgan Kaufmann Publishers, 2004
[2] Game Programming Gems, Multi-Resolution Maps for Interaction Detection
[3] Game Programming Gems II, Direct Access Quadtree Lookup

Next Page »

Proudly powered by WordPress Theme based upon Pool theme by Borja Fernandez.
Entries and comments feeds.