Hacker News new | ask | show | jobs
by aortega 5414 days ago
This craziness is what happens when you try to implement a specialized algorithm on a general-purpose state-machine. What stupid thing is a "cache" anyway, but an optimization because we are not using real computers but hugely complex state-machines called CPUs. FPGAs are as close as a real computer as we can have today.
1 comments

The parallels to high end network security are again interesting here; for instance, on the more interesting network processors, memory accesses were asynchronous, and instead of a "cache" you had a manually addressed "scratch" memory with latency comparable to registers.

... because "general purpose cache" wasn't helpful, but "exploitable locality" definitely was.

That's pretty close to how the first NVIDIA GPGPU (G80) worked. There was no cache, but only different kinds of memory, of which one was as fast as registers. Optimal access to RAM memory was possible with a limited set of access patterns, so "caching" had to be managed carefully manually.

In later GPUs they (mainly due to programmer laziness :-) ) switched to a more GPU-like automatically managed general purpose cache. The problem was that in most cases, being able to quickly write code that is reasonably fast was more important to developers than being able to squish out every cycle (though I believe it's still possible to disable the cache and manage it manually...)

The problem with automatic cache is the same as all "intelligent" CPU features, the CPU tries to predict what the program is doing, the programmer has to predict what the CPU will predict to optimize for that CPU. In the next generation of the CPU, the CPU vendor will try to optimize again for programs currently around. In the end, optimizing becomes much more complex.

There's certainly an advantage to keeping the logic "dumb" and simply having the software manage everything.