Hacker News new | ask | show | jobs
by groby_b 834 days ago
"20-30 sounds crazy low even for O(n²), though."

I mean, 30 means 2^30 sig digits. That means any operation on those are a bit expensive in pure memory throughput. Squaring that, let's use an FFT then it's just O(n log n) for the next step, with n being number of digits here. Or 30 * 2^30 ops, let's call it 2^35 ops. We're at 32 MFlops for a single dot. Let's only do a 1280*768 image, that's 32 GFlops to compute that image.

That's a perfectly nice rate, let's pretend it's half a second because all memory access was just perfectly sequential. (It's not, it's more).

Each additional step more than doubles the time spent. Which means even at 6 more iterations, we're staring at half a minute compute time. Nobody waits half a minute for anything. (Unless it's a web page downloading JS)

Now add the fact that I lied, memory access can't be fully sequential here. But even if it was, we're at least spilling into L3 cache, and that means some 40 cycles penalty every time we do. We'll do that a lot. So multiply time with 10 or so (I'm still hopelessly optimistic we have some efficiencies.), and suddenly even 30 iterations cost you 5 seconds.

And so, yes, it's really expensive even for modern PCs.

If you have a GPU, you can shuffle it off to the GPU, they're pretty good at FFTs - but even that's going to buy you, if we're really really optimistic, another 20 or so iterations before it bogs down. Doubling digits every cycle will make any hardware melt quickly.

1 comments

Ah, of course, not only does it require O(n²) work to multiply, but n grows to n² for each iteration (n bits -> 2n bits). So that's why you get O(2^n) _total_ time.