| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wkat4242 778 days ago
	Which llama3 is that? 8b or 70b? And what kind of quantisation? Just wondering. I'll never have that kind of resources (well not in the next 5 years) but just trying to put it into perspective..

1 comments

segmondy 778 days ago

8B, and it got better this morning, they merged in flash attention so I can now load almost 500k tokens with (96gb of vram) With that said, you can possibly have this kind of resource, this is a cheap build. Mixture of old and used GPUs.

link