Hacker News new | ask | show | jobs
by tptacek 5415 days ago
This is a crazy smart comment. Thanks.

... is anyone really using hardware to do this, by the way?

Hardware regexing at this scale is practically a COTS part, or was when I stopped doing network security products in '05.

2 comments

Unfortunately it's not that simple anymore. You actually have to do most of the order-book-building legwork to figure out if the quote is relevant: every order modification message, such as canceling an order or execution, omits the ticker. Makes sense: after all, the add-quote actually tells you the ticker, so there's no need to repeat that info.

This is where it gets tricky. FPGA based systems don't really give an edge here, and tilera-based solutions are still nascent

This craziness is what happens when you try to implement a specialized algorithm on a general-purpose state-machine. What stupid thing is a "cache" anyway, but an optimization because we are not using real computers but hugely complex state-machines called CPUs. FPGAs are as close as a real computer as we can have today.
The parallels to high end network security are again interesting here; for instance, on the more interesting network processors, memory accesses were asynchronous, and instead of a "cache" you had a manually addressed "scratch" memory with latency comparable to registers.

... because "general purpose cache" wasn't helpful, but "exploitable locality" definitely was.

That's pretty close to how the first NVIDIA GPGPU (G80) worked. There was no cache, but only different kinds of memory, of which one was as fast as registers. Optimal access to RAM memory was possible with a limited set of access patterns, so "caching" had to be managed carefully manually.

In later GPUs they (mainly due to programmer laziness :-) ) switched to a more GPU-like automatically managed general purpose cache. The problem was that in most cases, being able to quickly write code that is reasonably fast was more important to developers than being able to squish out every cycle (though I believe it's still possible to disable the cache and manage it manually...)

The problem with automatic cache is the same as all "intelligent" CPU features, the CPU tries to predict what the program is doing, the programmer has to predict what the CPU will predict to optimize for that CPU. In the next generation of the CPU, the CPU vendor will try to optimize again for programs currently around. In the end, optimizing becomes much more complex.

There's certainly an advantage to keeping the logic "dumb" and simply having the software manage everything.

veyron points this out: http://www.fixnetix.com/articles/display/129/

Personally, I think specialized hardware is a bad idea.