Anyone who doesn't grok floating point, assembler and gcc should probably look away now. This may fry your brain, it certainly hasn't been healthy for mine.
So I've been trying to debug a natty little problem with why my little assembler stub didn't generate a SIGFPE/FPE_FLTINV when it tried to add two signalling NaN's together.
Like any good programmer with a nice repeatable test case to hand I bring fprintf out to probe whats going on. I add the line:
fprintf(stderr,"faddRRR 0x%x + 0x%x\n", bit_cast<32int>(a), bit_cast<u32int>(b));
to my routine to check the numbers are indeed what they say they are. The observant amongst you may note that a and b are not touched, they are merely converted to unsigned integers in situ before fprintf does it's stuff.
Lo and behold, as soon as this fprintf is added my program starts throwing the correct signals again. "That's a little odd" I think. After all its printing out the right values ("faddRRR 0x7fbfffff + 0x7fbfffff") so it should be working. I spend a while reverting all the other changes until I confirm I really can just comment/un-comment that one line to change my program from working to broken.
We have now reached the point to break out gdb and trace at the instruction level. Now I feel at this point I should point out that the product I'm working on runs on X86_64 chips we make extensive use of SSE2 for out floating point operations. My understanding of this area is basically what I have picked up writing SSE2 stubs for various floating point operations. We know the X87 exists, but lets face it who would use it when SSE2 is faster and easier.
(gdb)i 0x701a9500: mov (%rdx,%rax,4),%eax (gdb) p/x $rax $2 = 0x7fbfffff (gdb)i 0x701a9503: mov %eax,0x108(%rsp) (gdb) p/x $rsp + 0x108 $3 = 0x7fbffff128 (gdb) x/w $3 0x7fbffff128: 0x7fbfffff (gdb) i 0x701a950a: flds 0x108(%rsp) "Hmmm, whats that do I wonder?" (queue ruffling though the manuals). "Well that seems fair enough, lets just check something" (gdb) info registers float st0 nan(0xffffff0000000000) (raw 0x7fffffffff0000000000) "$@$@!!!!nnngghhh" (gdb) i 0x701a9511: fstps 0x20(%rsp) "And it doesn't even do anything with it!!!!!!!!!!!!!!!!!"
So I try and understand what's just happened and why an fprintf makes a difference. The mystery of the signalling NaN is easy. The X87 does all its maths at double precision. As exceptions aren't enabled at this point it doesn't complain about loading a signalling NaN into its registers. It obviously thinks the signalling bit is no longer of interest in double mode.
0x701a8e73 : mov (%rdx,%rax,4),%eax 0x701a8e76 : mov %eax,0x108(%rsp) 0x701a8e7d : mov 0x108(%rsp),%r15d 0x701a8e85 : mov $0x3f,%eax 0x701a8e8a : sub %esi,%eax 0x701a8e8c : add $0xf8,%eax 0x701a8e91 : cltq 0x701a8e93 : mov (%rcx),%rdx 0x701a8e96 : mov (%rdx,%rax,4),%eax 0x701a8e99 : mov %eax,0x108(%rsp) 0x701a8ea0 : mov 0x108(%rsp),%r14d 0x701a8ea8 : mov %eax,%ecx 0x701a8eaa : mov %r15d,0x108(%rsp) 0x701a8eb2 : mov 0x108(%rsp),%edx 0x701a8eb9 : mov $0x702f10a1,%esi 0x701a8ebe : mov 2989739(%rip),%rdi # 0x70482d70 0x701a8ec5 : xor %eax,%eax 0x701a8ec7 : callq 0x700a3eb0 fprintf@plt .. .. 0x701a8fda : mov %r15d,0xc(%rsp) 0x701a8fdf : movss 0xc(%rsp),%xmm1 0x701a8fe5 : mov %r14d,0xc(%rsp) 0x701a8fea : movss 0xc(%rsp),%xmm0 0x701a8ff0 : mov %r12,%rdi 0x701a8ff3 : callq 0x701b8024 <potentialFaulty_fadds(float, float)>
So the very act of calling fprintf alters gcc's register allocation so instead of passing the floating point number in the X87 register stack it keeps them hanging around in the copious number of integer registers it has handy. As you can see the calling convention for potentialFaulty_fadds is to use the SSE registers. As far as I can tell it's just using the X87 registers for shits and giggles.
And breaking my code :-(
I'm filling this bug under "tricky"