| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mysteria 579 days ago
	Even with a PCIe FPGA card you're still going to be memory bound during inference. When running LLama.cpp on straight CPU memory bandwidth, not CPU power, is always the bottleneck. Now if the FPGA card had a large amount of GPU tier memory then that would help.