Hacker News new | ask | show | jobs
by thrownaway2424 3949 days ago
Why would an FPGA be a better solution than software running on another core?
3 comments

.. while keeping in mind that for some workloads GC is a large chunk of the app's work. For example, 10 Xeon cores worth of GC throughput (out of say 24) would be a pretty tall order for a FPGA, and as a fixed resource it easily becomes an Amdahl's law bottleneck.

It would be a cool thing to try still, and maybe doable with COTS hw: https://www-ssl.intel.com/content/www/us/en/embedded/technol... + http://www.hotchips.org/wp-content/uploads/hc_archives/hc21/...

It's a tall order because you set it up to be. Real system design would call for a balancing act, as usual. Remember that you can put a bunch of GC's on one FPGA that all run concurrently with access to shitloads of I/O and/or fast memory bus. Amdahl's law shouldn't kick in any more than with concurrent GC's in general. The parallelism, simplicity, and tech like in your link should make it faster than an on-board collector. The concept isn't speculation as it's already been done in two different ways:

Fine-Grained Parallel Compacting Garbage Collection through Hardware-Supported Synchronization (2010) http://www.ikr.uni-stuttgart.de/Content/Publications/Archive...

Stall-free, real-time collector for FPGA's (2012) http://researcher.watson.ibm.com/researcher/files/us-bacon/B...

The question is, "Can modern CPU's and off-chip FPGA's keep in sync without performance getting dragged down?" The FPGA's have gotten faster. The CPU's I/O have gotten faster. So, I'm sure it can be done but it might be difficult enough to be someone's Master thesis. ;)

Besides, I call for replacing current chips with open ones easy to modify for acceleration and security. Gaisler LEON4 SPARC, Rocket RISC-V, Cambrige's BERI/CHERI MIPS64... these all come to mind. Plan was to put them onto a high-end FPGA w/ concurrent GC's to test the scheme. Once it worked, ASIC conversion time baby. S-ASIC's are $200-500k on average with resulting production & packaging being way cheaper after that. Just hoping there's a few companies that would split the cost to eliminate most memory and control flow issues forever. ;)

I'm guessing since it's sitting on the memory bus it could intercept pointer modifications and synchronously update it's graph?
Memory bus is just for speed. You don't want it doing stuff like that lol. See these two LISP machines for where my inspiration of putting it on memory bus came from:

http://diyhpl.us/~bryan/papers2/paperbot/Design%20of%20a%20L...

(See section 7 for a radical... err realy old... way to do concurrent GC. Full paper available at ACM/IEEE or if you Google LISP processors Guy Steele enough.)

Scheme machine by Burger 1995

http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=569...

(See the main graphic and specifics in storage section later. Once again, GC-like stuff is handled in memory management part of the processor. This processor knows that, though, to assist GC a bit. Also different in that it was specified and then synthesized to heterogenous hardware with DDD "correct-by-construction" toolkit.)

So, have fun with those. Plus, Google hardware-assisted or hardware garbage collection to get lots of interesting results already done.

See my reply to anarcticpuffin.