| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by segmondy 778 days ago
	With 144gb of GPU memory, The most I can load for llama3 is 232k.

1 comments

wkat4242 778 days ago

Which llama3 is that? 8b or 70b? And what kind of quantisation?

Just wondering. I'll never have that kind of resources (well not in the next 5 years) but just trying to put it into perspective..

link

segmondy 778 days ago

8B, and it got better this morning, they merged in flash attention so I can now load almost 500k tokens with (96gb of vram) With that said, you can possibly have this kind of resource, this is a cheap build. Mixture of old and used GPUs.

link