Hexbyte Hacker News Computers

Intel’s manuals for their x86/x64 processor clearly state that the *fsin* instruction (calculating the trigonometric *sine) *has a maximum error, in round-to-nearest mode, of one unit in the last place. This is not true. It’s not even close.

The worst-case error for the *fsin* instruction for small inputs is actually about 1.37 quintillion units in the last place, leaving fewer than four bits correct. For huge inputs it can be much worse, but I’m going to ignore that.

I was shocked when I discovered this. Both the *fsin* instruction and Intel’s documentation are hugely inaccurate, and the inaccurate documentation has led to poor decisions being made.

The great news is that when I shared an early version of this blog post with Intel they reacted quickly and the documentation is going to get fixed!

I discovered this while playing around with my favorite mathematical identity. If you add a double-precision approximation for *pi* to the *sin()* of that same value then the sum of the two values (added by hand) gives you a quad-precision estimate (about 33 digits) for the value of *pi*. This works because the *sine* of a number very close to *pi* is almost equal to the error in the estimate of *pi*. This is just calculus 101, and a variant of Newton’s method, but I still find it charming.

The *sin()* function in VC++ is accurate enough for this technique to work well. However this technique relies on printing the value of the double-precision approximation of *pi* to 33 digits, and up to VC++ 2013 this doesn’t work – the extra digits are all printed as zeroes. So, you either need custom printing code or you need to wait for Dev 14 which has improved printing code.

So, I tried g++ on 32-bit Linux (Ubuntu 12.04) because I knew that *glibc* handles the printing just fine. However the results weren’t adding up correctly. I eventually realized that the *sin()* function in 32-bit versions of 2.15 *glibc* is just a thin wrappers around *fsin* and this instruction is painfully inaccurate when the input is near *pi*.

## Hexbyte Hacker News Computers Catastrophic cancellation, enshrined in silicon

The first step in calculating trigonometric functions like *sin()* is range reduction. The input number, which Intel says can be up into the quintillion range, needs to be reduced down to between +/- *pi/2.*

Range reduction in general is a tricky problem, but range reduction around *pi* is actually very easy. You just have to subtract the input number from a sufficiently precise approximation of *pi. *The worst-case for range reduction near *pi* will be the value that is closest to *pi*. For double-precision this will be a 53-bit approximation to *pi* that is off by, at most, half a unit in the last place (0.5 ULP). If you want to have 53 bits of accuracy in the result then you need to subtract from an approximation to *pi* that is accurate to at least 106 bits – half the bits will cancel out but you’ll still be left with enough.

But you actually need more than that. The x87 FPU has 80-bit registers which have a 64-bit mantissa. If you are doing long-double math and you want your range reduction of numbers near *pi* to be accurate then you need to subtract from at least a 128-bit approximation to *pi.*

This is still easy. If you know your input number is near *pi* then just extract the mantissa and do high-precision integer math to subtract from a hard-coded 128-bit approximation to *pi*. Round carefully and then convert back to floating-point. In case you want to give this a try I converted *pi* to 192-bits of hexadecimal:

C90FDAA2 2168C234 C4C6628B 80DC1CD1 29024E08 8A67CC74

But Intel doesn’t use a 128-bit approximation to *pi*. Their approximation has just 66 bits. Here it is, taken from their manuals:

C90FDAA2 2168C234 C

Therein lies the problem. When doing range reduction from double-precision (53-bit mantissa) *pi* the results will have about 13 bits of precision (66 minus 53), for an error of up to 2^40 ULPs (53 minus 13). The measured error is lower at approximately 164 billion ULPs (~37 bits), but that’s still large. When doing range reduction from the 80-bit long-double format there will be almost complete cancellation and the worst-case error is 1.376 quintillion ULPs.

Oops.

## Hexbyte Hacker News Computers Actually calculating sin

After range reduction the next step is to calculate *sin*. For numbers that are sufficiently close to *pi* this is trivial – the range-reduced number *is* the answer. Therefore, for *double(pi)* the error in the range reduction is the error in *fsin*, and *fsin* is inaccurate near *pi*, contradicting Intel’s documentation. QED.

Note that *sin(double(pi))* should *not* return zero. The *sine *of* pi* is, mathematically, zero, but since float/double/long-double can’t represent *pi* it would actually be incorrect for *sin()* to return zero when passed an *approximation* to *pi*.

## Hexbyte Hacker News Computers Frustration

It is surprising that *fsin* is so inaccurate. I could perhaps forgive it for being inaccurate for extremely large inputs (which it is) but it is hard to forgive it for being so inaccurate on *pi* which is, ultimately, a very ‘normal’ input to the *sin()* function. Ah, the age-old decisions that we’re then stuck with for decades.

I also find Intel’s documentation frustrating. In their most recent Volume 2: Instruction Set Reference they say that the range for *fsin* is -2^63 to +2^63. Then, in Volume 3: System Programming Guide, section 22.18.8 (Transcendental Instructions) they say “results will have a worst case error of less than 1 ulp when rounding to the nearest-even”. They reiterate this in section 8.3.10 (Transcendental Instruction Accuracy) of Volume 1: Basic Architecture.

Section 8.3.8 (Pi) of the most recent Volume 1: Basic Architecture documentation from Intel discusses the 66-bit approximation for *pi* that the x87 CPU uses, and says that it has been “chosen to guarantee no loss of significance in a source operand”, which is simply not true. This optimistic documentation has consequences. In 2001 an up-and-coming Linux kernel programmer believed this documentation and used it to argue against adding an *fsin* wrapper function to *glibc*.

Similar problems apply to *fcos* near *pi/2*. See this blog post for more details and pretty pictures.

Intel has known for years that these instructions are not as accurate as promised. They are now making updates to their documentation. Updating the instruction is not a realistic option.

I ran my tests on the 2.19 version of *glibc* (Ubuntu 14.04) and found that it no longer uses *fsin*. Clearly the *glibc* developers are aware of the problems with *fsin* and have stopped using it. That’s good because g++ sometimes calculates sin at compile-time, and the compile-time version is far more accurate than *fsin*, which makes for some bizarre inconsistencies when using *glibc* 2.15, which can make floating-point determinism challenging.

## Hexbyte Hacker News Computers Is it a bug?

When Intel’s *fdiv* instruction was found to be flawed it was clearly a bug. The IEEE-754 spec says exactly how *fdiv* should be calculated (perfectly rounded) but imposes no restrictions on *fsin*. Intel’s implementation of *fsin* is clearly weak, but for compatibility reasons it probably can’t be changed so it can’t really be considered a bug at this point.

However the documentation is buggy. By claiming near-perfect accuracy across a huge domain and then only providing it in a tiny domain Intel has misled developers and caused them to make poor decisions. Scott Duplichan’s research shows that at long-double precision the *fsin* fails to meet its guarantees at numbers as small as 3.01626.

For double-precision the guarantees are not met for numbers as small as about 3.14157. Another way of looking at this is that there are tens of billions of doubles near *pi* where the double-precision result from *fsin* fails to meet Intel’s precision guarantees.

The absolute error in the range I was looking at was fairly constant at about 4e-21, which is quite small. For many purposes this will not matter. However the relative error can get quite large, and for the domains where that matters the misleading documentation can easily be a problem.

I’m not the first to discover this by any means, but many people are not aware of this quirk. While it is getting increasingly difficult to actually end up using the *fsin* instruction, it still shows up occasionally, especially when using older versions of *glibc*. And, I think this behavior is interesting, so I’m sharing.

That’s it. You can stop reading here if you want. The rest is just variations and details on the theme.

## Hexbyte Hacker News Computers How I discovered this

I ran across this problem when I was planning to do a tweet version of my favorite *pi/sin()* identity

double pi_d = 3.14159265358979323846;

printf(“pi = %.33fn + %.33fn”, pi_d, sin(pi_d));

I compiled this with g++, confident that it would work. But before sending the tweet I checked the results – I manually added the two rows of numbers – and I was confused when I got the wrong answer – I was expecting 33 digits of pi and I only got 21.

I then printed the integer representation of *sin(pi_d)* in hex and it was 0x3ca1a60000000000. That many zeroes is very strange for a transcendental function.

At first I thought that the x87 rounding mode had been set to float for some reason, although in hindsight the result did not even have float accuracy. When I discovered that the code worked on my desktop Linux machine I assumed that the virtual machine I’d been using for testing must have a bug.

It took a bit of testing and experimentation and stepping into various *sin()* functions to realize that the ‘bug’ was in the actual *fsin* instruction. It’s hidden on 64-bit Linux (my desktop Linux machine is 64-bit) and with VC++ because they both have a hand-written *sin()* function that doesn’t use *fsin*.

I experimented a bit with comparing *sin()* to *fsin* with VC++ to characterize the problem and eventually, with help from Scott Duplichan’s expose on this topic, I recognized the problem as being a symptom of catastrophic cancellation in the range reduction. I’m sure I would have uncovered the truth faster if Intel’s documentation had not filled me with unjustified optimism.

Amusingly enough, if I’d followed my usual practice and declared *pi_d as const* then the code would have worked as expected and I would never have discovered the problem with* fsin!*

## Hexbyte Hacker News Computers Code!

If you want to try this technique for calculating *pi* but you don’t like manually adding 34-digit numbers then you can try this very crude code to do it for you:

void PiAddition() {

double pi_d = 3.14159265358979323846;

double sin_d = sin(pi_d);

printf(“pi = %.33fn + %.33fn”, pi_d, sin_d);char pi_s[100]; char sin_s[100]; char result_s[100] = {};

snprintf(pi_s, sizeof(pi_s), “%.33f”, pi_d);

snprintf(sin_s, sizeof(sin_s), “%.33f”, sin_d);

int carry = 0;

for (int i = strlen(pi_s) – 1; i >= 0; –i) {

result_s[i] = pi_s[i];

if (pi_s[i] != ‘.’) {

char d = pi_s[i] + sin_s[i] + carry – ‘0’ * 2;

carry = d > 9;

result_s[i] = d % 10 + ‘0’;

}

}

printf(” = %sn”, result_s);

}

The code makes lots of assumptions (numbers aligned, both numbers positive) but it does demonstrate the technique quite nicely. Here are some results:

With g++ on 32-bit Linux with *glibc* 2.15:

pi = 3.141592653589793115997963468544185

+ 0.00000000000000012246~~0635382237726~~

= 3.1415926535897932384~~58598850781911~~

With VC++ 2013 Update 3:

pi = 3.1415926535897931

~~00000000000000000~~

+ 0.000000000000000122464679914735320

= 3.1415926535897932~~22464679914735320~~

With VC++ Dev 14 CTP 3 and g++ on 64-bit Linux or glibc 2.19:

pi = 3.141592653589793115997963468544185

+ 0.0000000000000001224646799147353~~21~~

= 3.14159265358979323846264338327950~~6~~

In other words, if the sixth digit of *sin(double(pi))* is four then you have the correct value for *sin()*, but if it is zero then you have the incorrect *fsin* version.

## Hexbyte Hacker News Computers Compile-time versus run-time *sin*

g++ tries to do as much math as possible at compile time, which further complicates this messy situation. If g++ detects that *pi_d* is constant then it will calculate *sin(pi_d)* at compile time, using MPFR. In the sample code above if pi_d is declared as ‘const’ then the results are quite different. This difference between compile-time and run-time sin() in g++ with *glibc* 2.15 can be seen with this code and its non-unity result:

const double pi_d = 3.14159265358979323846;

const int zero = argc / 99;

printf(“%fn”, sin(pi_d + zero) / sin(pi_d));0.999967

Or, this equally awesome code that also prints 0.999967:

const double pi_d = 3.14159265358979323846;

double pi_d2 = pi_d;

printf(“%fn”, sin(pi_d2) / sin(pi_d));

It’s almost a shame that these goofy results are going away in glibc 2.19.

## Hexbyte Hacker News Computers Sine ~zero == ~zero

Some years ago I was optimizing all of the trigonometric functions for an embedded system. I was making them all run about five times faster and my goal was to get identical results. However I had to abandon that goal when I realized that many of the functions, including *sin()*, gave incorrect answers on denormal inputs. Knowing that the *sine* of a denormal is the denormal helped me realized that the original versions were wrong, and I changed my goal to be identical results except when the old results were wrong.

## Hexbyte Hacker News Computers 66 == 68

Intel describes their internal value for *pi* as having 66 bits, but it arguably has 68 bits because the next three bits are all zero, so their internal 66-bit value is a correctly rounded 68-bit approximation of *pi*. That’s why some of the results for *fsin* are slightly more accurate than you might initially expect.

## Hexbyte Hacker News Computers Sin() is trivial (for almost half of input values)

For small enough doubles, the best answer for *sin(x)* is x. That is, for numbers smaller than about 1.49e-8, the closest double to the *sine* of x is actually x itself. I learned this from the glibc source code for sin()*.* Since half of all doubles are numbers below 1.0, and since doubles have enormous range, it turns out that those numbers below 1.49e-8 represent about 48.6% of all doubles. In the glibc *sin()* source code it tests for the top-half of the integer representation of *abs(x)* being less than 0x3e500000, roughly equivalent to <= 1.49e-8, and that represents 48.6% of the entire range for positive doubles (0x80000000). The ‘missing’ 1.4% is the numbers