Yep, I think the last proper embedded project I worked on before this one was on an i386EX running at 40MHz, trying to eke 40KHz sample rate processing the output of a line scan CCD. I, a brash 22yo, declared it impossible to process ~250 pixels in 10 clock cycles (there was DMA and cool dual-port RAM stuff going on). My boss at the time explained the obvious way to approach it and I tried again, I think I got to 36KHz or something after pulling out all the stops. Fun times.
The idea that you could just use trig functions and get away with it on a micro is still kinda foreign. :P
The problem is that today, a lot of programmers remember that and ignore the fact that you can often do floating point math on a modern real time system and meet all your deadlines.
Just last week I fixed a bug in such a system: in an effort to speed things up by avoiding floating point math, calculations were done with uint32_t's and the developer(s) didn't notice that in some cases there was an internal overflow before the final result was produced.
The idea that you could just use trig functions and get away with it on a micro is still kinda foreign. :P