| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yencabulator 247 days ago
	> The GPU is significantly faster and it has cuda, But (non-batched) LLM processing is usually limited by memory bandwidth, isn't it? Any extra speed the GPU has is not used by current-day LLM inference.

1 comments

Numerlor 245 days ago

I believe just inference is bandwidth limited, prompt processing and other tasks on the other hand needs the compute. As I understand it, the workstation is also as a whole focused on the local development process before readying things for the datacenters, not just running LLMs

link