| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cubefox 478 days ago
	> Transformers are typically memory-bandwidth bound during decoding. Not in case of language models, which are typically bound by memory size rather than bandwidth.

1 comments

whimsicalism 478 days ago

nope

link

cubefox 478 days ago

I assume even this one won't run on an RTX 5090 due to constrained memory size: https://news.ycombinator.com/item?id=43270843

link

whimsicalism 478 days ago

sure on consumer GPUs but that is not what is constraining the model inference in most actual industry setups. technically even then, you are CPU-GPU memory bandwidth bound more than just GPU memory, although that is maybe splitting hairs

link

cubefox 478 days ago

Why are industry setups considered actual while others are not?

link