| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by d_silin 2163 days ago
	I wonder if it is possible to add a (small) FPGA to a personal computer that could accelerate any specific software tasks (video/audio encoding, ML algorithms, compression, extra FPU capabilities) on user demand.

7 comments

jeffreyrogers 2163 days ago

The problem with this will be the overhead of transferring data to/from the FPGA, which once accounted for often causes doing the computation on the CPU to make more sense. It's obviously not a show-stopper, since GPUs have the same problem, but are still useful, but it's hard to find a workload that maps well to this solution.

link

derefr 2163 days ago

In a DAW, accelerating a heavy VST plugin might make sense. But often those are amenable to being translated to GPGPU code already.

I guess the one place where GPGPU-based solutions wouldn't work, is when the code you want to accelerate is necessarily acting as some kind of Turing machine (i.e. emulation for some other architecture.) However, I can't think of a situation where an FPGA programmed with the netlist for arch A, running alongside a CPU running arch B, would make more sense than just getting the arch-B CPU to emulate arch A; unless, perhaps, the instructions in arch-A are very, very CISC, perhaps with analogue components (e.g. RF logic, like a cellular baseband modem.)

link

not2b 2163 days ago

This is normally handled in emulation by putting the inner parts of the testbench (the transactors) onto the FPGA as well, to minimize the amount of data that has to be transferred between the CPU and the FPGA. If the FPGA is to be used as a peripheral, again a division of labor needs to be found that minimizes the amount of data that needs to be communicated. But if there is FPGA logic on the same chip as the CPU cores, the overhead can be greatly reduced, and we're seeing more of that now.

link

deelowe 2163 days ago

I assumed this was kind of intel's plan when they purchased Altera. I this issue with this is the amount of time it takes to load the bitstream, but I thought I saw some things recently where progress was being made on this front.

link

vzidex 2163 days ago

> issue with this is the amount of time it takes to load the bitstream, but I thought I saw some things recently where progress was being made on this front

You saw correctly, work is indeed being done to build "shells" that can accept workloads without the user having to go through the FPGA tooling/build process.

link

daxfohl 2163 days ago

It's been possible for a long time, but there are big challenges to adoption. Every FPGA is different and the image is tightly coupled to the chip, so you'd have to compile the algorithm specifically to your chip before loading, which can take hours. Then loading the image each time you change out accelerators for a different application can take minutes. Then the software that uses the accelerator would have to know which chip and which image you're running and send data to it accordingly. Then you have to remember that FPGA's aren't really that great of accelerators sometimes, since they run at such low clock speeds, have crummy memory interfaces, limited gate support for floating point or even integer multiplication, etc. CPU's commonly outperform them even at the things they're supposed to be good at.

So it's unlikely ever to gain broad acceptance because the software vendors would have to support such a high number of permutations and the return can be questionable. This is why you see far more accelerators based on ASICs that have higher clock speeds and baked-in circuitry for specific tasks, with standardized APIs.

But sure, there's nothing preventing you from buying an FPGA board, hooking it up to your PC, creating a few images that do the accelerations you want, and writing software that uses them, swapping the image in when your program loads. You could even write a smart driver that swaps the image only if it's not in use by another app, or whatever. It's just unlikely you'll ever find a bunch of third-party software that supports it.

link

tails4e 2163 days ago

There absolutely is. There are PCIe cards you can plugin and use them as accelerators, just like you would use a GPU. Of course programming them to do the task you want is harder, but it can do anything. Saw a great example where someone implemented memcached on a single FPGA plugin and replaced many Xeons with it.

link

sod 2163 days ago

Isn't that what Apple did with that Afterburner Card for the MacPro? I read in https://www.anandtech.com/show/15646/apple-now-offering-stan... that that card is an fpga.

I could imagine that Apple will include something like this in their Apple Silicon SOC for ARM macs.

The Afterburner Card is not user programmable, but maybe it may in the future and this was just the first try to get the hardware in the field.

link

rustybolt 2163 days ago

Yes, and it has been done. There are FPGA's that you can connect to with PCIe, and you only have to pay the small price of writing an FPGA implementation for your usecase. It usually takes just a couple of weeks (OK, maybe months).

link

SomeoneFromCA 2163 days ago

You might actuall go even faster than PCIe, by pretending being a DDR4 memory stick.

link

geforce 2163 days ago

IIRC some CPUs of the Intel Atom series already have an embedded FPGA.

link

duskwuff 2163 days ago

Intel has launched a couple of Xeon Gold CPUs (like a variant of the 6138P) with integrated FPGAs for specific markets. Nothing mass-market, though, and they don't seem to have caught on much.

link