Bring me the Head of a GCC Hacker

Posted on Thu 24 August 2006 by alex in geek

Anyone who doesn't grok floating point, assembler and gcc should probably look away now. This may fry your brain, it certainly hasn't been healthy for mine.

So I've been trying to debug a natty little problem with why my little assembler stub didn't generate a SIGFPE/FPE_FLTINV when it tried to add two signalling NaN's together.

Like any good programmer with a nice repeatable test case to hand I bring fprintf out to probe whats going on. I add the line:

fprintf(stderr,"faddRRR 0x%x + 0x%x\n", bit_cast<32int>(a), bit_cast<u32int>(b));

to my routine to check the numbers are indeed what they say they are. The observant amongst you may note that a and b are not touched, they are merely converted to unsigned integers in situ before fprintf does it's stuff.

Lo and behold, as soon as this fprintf is added my program starts throwing the correct signals again. "That's a little odd" I think. After all its printing out the right values ("faddRRR 0x7fbfffff + 0x7fbfffff") so it should be working. I spend a while reverting all the other changes until I confirm I really can just comment/un-comment that one line to change my program from working to broken.

We have now reached the point to break out gdb and trace at the instruction level. Now I feel at this point I should point out that the product I'm working on runs on X86_64 chips we make extensive use of SSE2 for out floating point operations. My understanding of this area is basically what I have picked up writing SSE2 stubs for various floating point operations. We know the X87 exists, but lets face it who would use it when SSE2 is faster and easier.

So I comment out the fprintf again and start tracing through the code:

(gdb)i
0x701a9500: mov    (%rdx,%rax,4),%eax
(gdb) p/x $rax
$2 = 0x7fbfffff
(gdb)i
0x701a9503: mov    %eax,0x108(%rsp)
(gdb) p/x $rsp + 0x108
$3 = 0x7fbffff128
(gdb) x/w $3
0x7fbffff128:   0x7fbfffff
(gdb) i
0x701a950a: flds   0x108(%rsp)

"Hmmm, whats that do I wonder?" (queue ruffling though the  manuals).
"Well that seems fair enough, lets just check something"
(gdb) info registers float
st0            nan(0xffffff0000000000)  (raw 0x7fffffffff0000000000)

"$@$@!!!!nnngghhh"
(gdb) i
0x701a9511: fstps  0x20(%rsp)
"And it doesn't even do anything with it!!!!!!!!!!!!!!!!!"

So I try and understand what's just happened and why an fprintf makes a difference. The mystery of the signalling NaN is easy. The X87 does all its maths at double precision. As exceptions aren't enabled at this point it doesn't complain about loading a signalling NaN into its registers. It obviously thinks the signalling bit is no longer of interest in double mode.

The interesting thing is why the fprintf makes the difference. So out again with gdb:

0x701a8e73 : mov    (%rdx,%rax,4),%eax
0x701a8e76 : mov    %eax,0x108(%rsp)
0x701a8e7d : mov    0x108(%rsp),%r15d
0x701a8e85 : mov    $0x3f,%eax
0x701a8e8a : sub    %esi,%eax
0x701a8e8c : add    $0xf8,%eax
0x701a8e91 : cltq
0x701a8e93 : mov    (%rcx),%rdx
0x701a8e96 : mov    (%rdx,%rax,4),%eax
0x701a8e99 : mov    %eax,0x108(%rsp)
0x701a8ea0 : mov    0x108(%rsp),%r14d
0x701a8ea8 : mov    %eax,%ecx
0x701a8eaa : mov    %r15d,0x108(%rsp)
0x701a8eb2 : mov    0x108(%rsp),%edx
0x701a8eb9 : mov    $0x702f10a1,%esi
0x701a8ebe : mov    2989739(%rip),%rdi        # 0x70482d70
0x701a8ec5 : xor    %eax,%eax
0x701a8ec7 : callq  0x700a3eb0 fprintf@plt
..
..
0x701a8fda : mov    %r15d,0xc(%rsp)
0x701a8fdf : movss  0xc(%rsp),%xmm1
0x701a8fe5 : mov    %r14d,0xc(%rsp)
0x701a8fea : movss  0xc(%rsp),%xmm0
0x701a8ff0 : mov    %r12,%rdi
0x701a8ff3 : callq  0x701b8024 <potentialFaulty_fadds(float, float)>

So the very act of calling fprintf alters gcc's register allocation so instead of passing the floating point number in the X87 register stack it keeps them hanging around in the copious number of integer registers it has handy. As you can see the calling convention for potentialFaulty_fadds is to use the SSE registers. As far as I can tell it's just using the X87 registers for shits and giggles.

And breaking my code :-(

I'm filling this bug under "tricky"