Hacker News new | ask | show | jobs
by pclmulqdq 326 days ago
Pointers and dynamic objects are probably fine given the ability to do indirect loads, which I assume they have (Side note: I have built b-trees on FPGAs before, and these kinds of data structures are smaller than you think). It's actually pure code size that is the problem here rather than specific capabilities, as long as the hardware supports those instructions.

Instead of assembly instructions taking time in these architectures, they take space. You will have a capacity of 1000-100000 instructions (including all the branches you might take), and then the chip is full. To get past that limit, you have to store state to RAM and then reconfigure the array to continue computing.

2 comments

Agree that code size is a significant potential issue, and that going out to memory to reprogram the fabric will be costly.

Re: pointers, I should clarify that it’s not the indirection per se that causes problems — it’s the fact that, with (traditional) dynamic memory allocation, the data’s physical location isn’t known ahead of time. It could be cached nearby, or way off in main memory. That makes dataflow operator latencies unpredictable, so you either have to 1. leave a lot more slack in your schedule to tolerate misses, or 2. build some more-complicated logic into each CGRA core to handle the asynchronicity. And with 2., you run the risk that the small, lightweight CGRA slices will effectively just turn into CPU cores.

Oh, many embedded architectures don't have a cache hierarchy and instead place dynamic objects in one SRAM. Access latency is constant anywhere you go.
Hmm. You'd be able to trade off time for that space by using more general configurations that you can dynamically map instruction-sequences onto, no?

The mapping wouldn't be as efficient as a bespoke compilation, but it should be able to avoid the configuration swap-outs.

Basically a set of configurations that can be used as an interpreter.