Hacker News new | ask | show | jobs
by clarry 3688 days ago
We're still doing plenty of graphics on the CPU. For many applications, I'd prefer that it stay that way. And it doesn't seem like CPUs are becoming massively parallel anytime soon.
1 comments

The CPUs still had been doing floating point math much faster than flow control for the past 20 years or so.
Can you point me to relevant examples of line drawing or related things (e.g. triangle or polygon filling) that is faster to do in floating point (which may eventually need to convert to integer for addressing) than the old school way? Besides, it's not like the line drawing needs much in the control flow aside from a bit of setup and then a loop. Which you'll need with floating point too.

And if you're e.g. filling a triangle? Going the barycentric way, you have a relatively speaking big amount of floating point operations per pixel, in addition to the loop, plus some initial setup. And you might have to compare signs...

I don't believe it's faster. But I'm open to being shown wrong.

Sure, go find a Bresenham line drawing first, verify that it does flow control for every step (a pixel) then compare it to the naive implementation (y = kx + a). Loops are not as expensive as what Bresenham does since even the simple branch prediction works fine for them and there are even better ways the CPU can deal with a loop on newer CPUs. Bresenham's switch is inherently unpredictable since it's comparing accumulated errors.
Back in the mid 80s I rediscovered a form of Bresenham on PDP-11 assembler that used integer overflow to avoid branches. I've forgotten the details, but I remember that it was integer only, and used an accumulator word which, on overflow, became a small number (somehow triggering a change in the dependent coordinate). Damn I wish I could remember the details. One of these days I'll dig up the assembler specs and re-derive it.

Or maybe I'll discover I was using a branch after all!

The whole point of Bresenham is choosing between two integer values to minimize the accumulated error. You can totally implement any conditional computation without explicit conditional branch instructions e.g. using jump tables, conditional moves or predication etc. But the nature of your algorithm will remain conditional and a modern CPU still won't be able to process it as fast as a naive implementation.