| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jwr 4465 days ago

That's not good advice. Whether you need to use assembly depends on the particular situation at hand.

Here's a practical example: as a result of redesigning the algorithm to use fixed-point and implementing it in assembly, I got it to run 600x faster than the initial C version. Big O complexity was the same, the difference was in the constant factor. But the constant factor matters! In my case, it meant that you could get your computation done in half a day instead of a year.

Yes, it took me 3 weeks to get the algorithm implemented, instead of a single day, but even so — it was definitely worth it. And in many cases even a 3-fold improvement in speed is important, if you have long-running calculations.

1 comments

graphene 4465 days ago

That bit about fixed point is exremely interesting.. I found your blog post about the project you're (I think) referring to (http://jan.rychter.com/enblog/2009/12/4/x86-assembly-encount...), but it doesn't mention the fixed point part.

Not knowing too much about processor architecture, I don't understand how fixed point can be much faster, since floating point ops are implemented in hardware.. I presume you used integer operations on your fixed point values, but could you explain a bit why it ends up being much faster than floating point?

link

jwr 4465 days ago

It all depends on how precise your fixed point values need to be. If you can squeeze them into 8 bits (I could), you can use SSE 128-bit registers to operate on 16 values at a time. It gets even better with AVX, although that wasn't available to me at the time.

So the speedup is not just from going to fixed point, but from managing to use the vector instructions.

link