Trigonometric functions are costly operations and can slow down your application if they are extensively used. There are two reasons why: First, Math.sin() is a function, and thus needs a function call which simple eats up some time. Second, the result is computed with much more precision than you would ever need in most situations.

Most often you just want the periodic wave-like characteristics of the sine or cosine, which can be approximated in various ways. One common way of making it faster is to create a *lookup-table* by computing the sine at discrete steps and storing the result in an array. For example:

var sineTable:Array = []; for (var i:int = 0; i < 90; i++) { sineTable[i] = Math.sin(Math.PI/180 * i) }

Due to the symmetry of the sine wave, it's sufficient to compute one quadrant only (0..pi/2), and the other 3/4's of the circle can be computed by shifting and wrapping the input value. The biggest drawback is that the values are stored at a fixed resolution and so the result is not very accurate. This can be enhanced with linear interpolation:

x = 22.5; y = sineTable[int(x)] + (sineTable[int(x + .5)] - sineTable[int(x)]) / 2;

Much better, but yet the error exists. It also involves accessing array elements which makes the code rather slow. Another technique uses *taylor series approximation*:

sin(x) = x - (x^3)/3! + (x^5)/5! - (x^7)/7! + ...

Like with the lookup-table, evaluating this term is costly.

After searching for alternatives, I finally found a fantastic solution using a quadratic curve which blows everything away in terms of performance *and* accuracy. For a detailed derivation, please follow the link because I won't go into it.

I did minor optimizations to figure out what AS3 likes most, and arrived at some code that can be up to 14x faster, while still being very accurate. However, you have to use it directly - do not place the code inside a function, because the additional function call sweeps out the performance gain, and you are left with an approximation that is actually *slower* compared to a native Math.sin() or Math.cos() call. Also note that cos(x) = sin(x + pi/2) or cos(x - pi/2) = sin(x), so computing the cosine is just of matter adding pi/2 to the input value.

Download source: fastTrig.as.

Below is a simple visualization to show you the quality of the approximation. The high precision version can replace the Math.sin() and Math.cos() calls in nearly all situations.

### Low precision sine/cosine (~14x faster)

//always wrap input angle to -PI..PI if (x < -3.14159265) x += 6.28318531; else if (x > 3.14159265) x -= 6.28318531; //compute sine if (x < 0) sin = 1.27323954 * x + .405284735 * x * x; else sin = 1.27323954 * x - 0.405284735 * x * x; //compute cosine: sin(x + PI/2) = cos(x) x += 1.57079632; if (x > 3.14159265) x -= 6.28318531; if (x < 0) cos = 1.27323954 * x + 0.405284735 * x * x else cos = 1.27323954 * x - 0.405284735 * x * x; }

### High precision sine/cosine (~8x faster)

//always wrap input angle to -PI..PI if (x < -3.14159265) x += 6.28318531; else if (x > 3.14159265) x -= 6.28318531; //compute sine if (x < 0) { sin = 1.27323954 * x + .405284735 * x * x; if (sin < 0) sin = .225 * (sin *-sin - sin) + sin; else sin = .225 * (sin * sin - sin) + sin; } else { sin = 1.27323954 * x - 0.405284735 * x * x; if (sin < 0) sin = .225 * (sin *-sin - sin) + sin; else sin = .225 * (sin * sin - sin) + sin; } //compute cosine: sin(x + PI/2) = cos(x) x += 1.57079632; if (x > 3.14159265) x -= 6.28318531; if (x < 0) { cos = 1.27323954 * x + 0.405284735 * x * x; if (cos < 0) cos = .225 * (cos *-cos - cos) + cos; else cos = .225 * (cos * cos - cos) + cos; } else { cos = 1.27323954 * x - 0.405284735 * x * x; if (cos < 0) cos = .225 * (cos *-cos - cos) + cos; else cos = .225 * (cos * cos - cos) + cos; }

Nice find, though I think I’d have to be in a real optimization bind before I replaced all my sin/cos calls with all that stuff. :)

Once again, incredible post.

There is an error in the link. The good link is :

http://lab.polygonal.de/wp-content/articles/fast_trig/fastTrig.as

thank you, fixed now.

Woah, that’s sick! Thanks for sharing, I always stopped at the first step :/

Awesome! Let’s see how PV3D can benefit from that! ;-)

The article on DevMaster mentions that you should avoid branching, especially in loops. I wonder if the same problem applies to AS3? Is there a speed improvement if you replace the branches by binary logic twists?

Hi, it didn’t work for me (the author also says that this is not valid C code at all) Binary & converts the result to an integer. Only this works:

`x -= int(x > y) && z;`

But the code above is a lot slower than the pure if..else statements.

Nice work Michael – this one goes a long way back. I think it may have been in Hart’s ‘Computer Approximations.’

One thing I’ve found helpful when working with compilers that don’t support function inlining is to use the C preprocessor or awk or something similar to do all my debugging with library calls and then have the tool replace them with inline approximations going into production.

Another benefit of having inline code is that you can combine the sin or cos computation with whatever equation it’s used in and realize additional gains by having the trig function result immediately available in a register.

regards,

– jim

hey cool I didn’t know there is even a whole book about approximations only ;-) talking about preprocessors, what IDE do you use ? I was trying to integrate the C preprocessor into FlashDevelop, but failed at configuring make or ant to use it.

Michael:

Another classic is Cody’s ‘A Software Manual for the Elementary Functions’. There is a relatively new book by Muller – ‘Elementary Functions: Algorithms and Implementation’, but I haven’t read it yet.

I’ve given up on direct integration of other tools to any IDE – don’t have the bandwidth to work out all the issues, so I run awk or another text processor directly on the .AS files — very old school.

Best of luck in your continued (and superb) efforts!

regards,

– jim

Hey thanks for the reference!

Doing University and making a blog isn’t easy but shortly some content will popout =D

Here’s a modified fixed-point version that performs better : http://blog.haxe.org/entry/26

I’ev tested this in AS2, and it’s not faster. So this optimisation trick is ONLY for AS3. Just to let you know.

Thanks. This post was useful and more accurate than using an MS-Excell trendline function (2nd degree). Made minor changes that should speed things up since addition is cheaper than mult usually. This is the c code:

double sin(double rad){

double sine;

if (rad < 0)

sine = rad*(1.27323954 + .405284735 * rad);

else

sine = rad*(1.27323954 – 0.405284735 * rad);

if (sine < 0)

sine = sine*(-0.225 * (sine + 1) + 1);

else

sine = sine*(0.225 * (sine – 1) + 1);

return sine;

}

@DYessick: I tried your version in AS3 and there is no noticeable difference unfortunately. Probably due to the weirdness of AS3.

Overall the high precision approximation gives a nice boost though, although it is closer to ~6.7 faster on my tests. This could depend on a variety of factors I imagine, not least of all your processor architecture.

You wrote:

“[…] itÃ¢â‚¬â„¢s sufficient to compute one quadrant only (0..pi), and the other 3/4Ã¢â‚¬â„¢s […]”

But … 1/4 of 2pi is pi/2 and not pi, so a quadrant is (0..pi/2), i’m wrong ?

Greg

thanks greg, you are right. i have fixed the article.

More over, you can use a Quadrant but an Octant is sufficient due to the symetry (x y, -x y, -x -y, x -y, y x, -y x, -y -x, y -x)

This is about 50% slower than using Math.cos(); ive found…I don’t get it? I’m not using it in a function either.

always compile in release mode and use the release player (not the debug version) for doing benchmarks.

That makes a difference? Damn I didn’t know that…I’ll have to go find my release player, haven’t even used that thing in forever.

This may be a bit outdated – I think Adobe took pretty harsh to your approximation genius and generated a super fast trig table…

Compiling using the Flex 3.2 SDK, I get these results…

iterations: 1,000,000

algorithm: FastMath [ high-quality approximation ]

angle: 1.302888286613767

FastMath.sin( 1.302888286613767 ) = 0.9643253756148902

– Time: 475 ms

…

iterations: 100,000

algorithm: Math [ flash ]

angle: 1.302888286613767

Math.sin( 1.302888286613767 ) = 0.9643267785343497

– Time: 167 ms

…

Can anyone else confirm this? Thanks!

I also made a mistake when typing the last post….The iterations for the Math.sin() test were also = 1,000,000 – *not* 100,000.

hi matt,

I cannot confirm this. make sure you are using the mxmlc compiler with debug=false and run the swf inside a release player. otherwise you get misleading results. with the Flex3.2 SDK the high quality approximation is now almost exactly 10x faster on my machine.

Thanks alot, this really helps. But I was wondering if anyone could tell me where the numbers 0.405284735 and 0.225 came from?

Hi,

I know this is outdated, but these results may help somebody – these tests were done in Flash Lite 2.0 on a Nokia 6121 (I will test on more phones)

Calculating both sin and cos of 100 angles using Math.sin and Math.cos

= 6216 ms

Calculating the same thing using FastMath inline

= 4685 ms

In Flash Lite FastMath is around 25-30% faster

Just a note (it says this in the post too) that wrapping FastMath in a function call negates the speed increase, and you get the same results as just using Math.sin/cos.

And sorry ‘of 100 angles’ should be ‘of 1000 angles’…

for sin, radians 0 <= x <= pi

temp = x * 2 / pi = x * 0.636619…

sin = temp * ( 2 – temp )

sin = sin * ( 0.225 * ( sin – 1 ) + 1 )

without refinement step, maximum error is 0.056 at +- 0.47

with refinement, maximum error is 0.001 at +- 1.38

for degrees, swap pi with 90

sorry, swap pi with 180, so temp = x / 90

Hi

i want draw sine curve same as above but i need text insted of pixel

Thanks for a wonderful post, l ve been looking for such information, I will join jour rss feed now.

Thanks for this. Excellent post.

Does anyone have a similar technique for Atan? I’m scratching my head a bit on this one.

Thanks.

For help with atan/atan2 check out this thread for starters. Hope it helps.

And this fast atan2 with very low error also seems sound.

Man, thanks for taking the time to post this. I’m sure you’ve been getting a few of these generic thank-you’s, but this blog is basically a godsend for us as3 game devs!

I am using C in a different application, but found this useful. I will have to compare this quadratic with my current approximation. Mine, after wrapping the function, this is a modified form using the first two iterations of the Taylor series centered around x=0, but it results in approx 1.5% max error, which is found in the area between pi/4 to pi/2.

y = x + x*x*x*fact3

fact3 = -1/6.25, a constant I have defined for the preprocessor that normalizes sin(pi/2) = 1.0.

Comparing, it’s 3 multiplies & 1 add so we’re probably equal performance, except maybe this form might take advantage of SSE instructions in a more beneficial way ???

I will compare the accuracy of the quadratic method.

Thanks for the post.

your mileage may vary using, obviously, a different language and platform, but for those coding in plain C or C++, the following performs slightly faster than the Low Precision sin/cos and is incredibly more accurate.

above low precision sin: 0.085516% avg error

max error ~ 27%

code below for sin: 0.006660% avg error

max error ~1.5%

both were run computing millions of loops of 1000-step increment from x = 0 to 2*pi.

Below slightly outperforms in speed, greatly exceeds accuracy:

#define MM_PI 3.14159265358979f

#define D_PI 6.28318530717959f

//globals

static const float p2 = MM_PI/2.0f;

static const float mp2 = -p2;

static const float fact3 = 0.148148148148148;//6.75f;

float tmp;

bool sign;

static inline float f_sin(float x) {

float y = 0.0f;

float tmp = 0.0f;

bool sign = 0;

if ((x>D_PI) || (x<-D_PI)) x = fmod(x,D_PI);

if (x MM_PI) {

x = D_PI – x;

sign = 1;

}

if (x <= p2) y = x – x*x*x*fact3;

else {

tmp = x – MM_PI;

y = -tmp + tmp*tmp*tmp*fact3;

}

if (sign) y = -y;

return y;

}

Then for curiosity I decided to compare accuracy the high precision version:

accuracy is very comparable, speed is not much worse than the lower precision version, so the cubic approximation I have posted above seems to be in my opinion a good balance of speed and precision.

Awesome article. Will come again. :-)

Michael, do you know if this still true (AS3 / FlashPlayer 10.2)?

Could you compile your code and benchmark it again??

yes, results are still valid. I’m using those approximations all the time ;)

I tried your fast version against Math lib (FlashDevelop, Flex SDK 4, FP Debugger 10.1)

package TestSuit

{

import flash.display.Sprite;

import flash.events.MouseEvent;

import flash.utils.getTimer;

public class FastMathTest extends Sprite

{

private var Outer:int, Inner:int;

private var i:int, j:int;

private var Angle:Number;

private var Rad:Number;

private var SinA:Number;

private var CosA:Number;

private var SinB:Number;

private var CosB:Number;

private var Time:int;

private var TimeA:int = 0;

private var TimeB:int = 0;

public function FastMathTest()

{

Inner = 100;

Outer = 10000;

stage.addEventListener(MouseEvent.CLICK, RunTest);

}

private function RunTest(e:MouseEvent):void

{

i = Outer

while(--i != 0)

{

Angle = Math.random() * 360;

j = Inner

while(--j != 0)

{

Time = getTimer();

/* Fast Math.sin() & Math.cos() */

Rad = Math.PI / 180 * Angle;

//always wrap input angle to -PI..PI

if (Rad 3.14159265)

Rad -= 6.28318531;

//compute sine

if (Rad 3.14159265)

Rad -= 6.28318531;

if (Rad < 0)

CosA = 1.27323954 * Rad + 0.405284735 * Rad * Rad

else

CosA = 1.27323954 * Rad - 0.405284735 * Rad * Rad;

TimeA += getTimer() - Time;

}

j = Inner

while(--j != 0)

{

Time = getTimer();

SinB = Math.sin(Math.PI / 180 * Angle);

CosB = Math.cos(Math.PI / 180 * Angle);

TimeB += getTimer() - Time;

}

}

trace("Fast", TimeA, SinA, CosA);

trace("Math", TimeB, SinB, CosB);

}

}

}

`Fast 486 -0.8421372695852312 -0.6367761151784564`

Math 420 -0.8114847762502067 -0.5843735602456291

always use the release player for correct results, debug player is a lot slower!

I was wondering if there is an arc sine optimisation (to aim a Display object to your mouse), and I found this aproximation:

asin = x + x * x * x * x * x * 0.4971

But I don’t know if it faster than the Math.asin() function. How would I know this?