| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wkat4242 785 days ago
	Wow there was already a 256k version (dolphin). 1M is insane. Be aware you need a lot of memory though

1 comments

segmondy 785 days ago

With 144gb of GPU memory, The most I can load for llama3 is 232k.

link

wkat4242 785 days ago

Which llama3 is that? 8b or 70b? And what kind of quantisation?

Just wondering. I'll never have that kind of resources (well not in the next 5 years) but just trying to put it into perspective..

link

segmondy 784 days ago

8B, and it got better this morning, they merged in flash attention so I can now load almost 500k tokens with (96gb of vram) With that said, you can possibly have this kind of resource, this is a cheap build. Mixture of old and used GPUs.

link