Hacker News new | ask | show | jobs
by gsmecher 1043 days ago
On FPGAs, a register file probably fits better into distributed RAM than block RAM.

On Xilinx, for example: a 64-bit register file doesn't map efficiently to Xilinx's RAMB36 primitives. You'd need 2 RAMB36 primitives to provide a 64-bit wide memory with 1 write port and 2 read ports, each addressed separately. Only 6% (32 of 512) entries in each RAMB36 are ever addressable. It's this inefficient because ports, not memory cells, are the contented resource and BRAMs geometries aren't that elastic.

A 64-bit register file in distributed RAM, conversely, is a something like an array of DPRAM32 primitives (see, for example, UG474). Each register would still be stored multiple times to provide additional ports, but depending on the fabric, there's less (or no) unaddressed storage cells.

The Minimax RISC-V CPU (https://github.com/gsmecher/minimax; advertisement warning: my project) is what you get if you chase efficient mapping of FPGA memory primitives (both register-file and RAM) to a logical conclusion. Whether this is actually worth hyper-optimizing really depends on the application. Usually, it's not.