| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tacoman 873 days ago
	I am using the exact same model. Ryzen 5600G w/32GB and an Nvidia P40 w/24GB VRAM 20/33 layers offloaded to GPU, 4K context. Uses 25GB system RAM and all 24GB VRAM. 5-7 tokens per second.

2 comments

zaat 873 days ago

Context is set to 32768, I didn't change it I guess that's the model's default.

Thanks for making me feel better about investing in tht motherboard + CPU + RAM upgrade and deferring the GPU upgrade.

link

attentive 872 days ago

and Groq does 485.08 T/s on mixtral 8x7B-32k

I am not sure local models have any future other than POC/research. Depends on the cost of course.

link

tome 869 days ago

(Groqster here) For anyone who wants to try it, you can go to https://chat.groq.com/ and choose Mixtral from the drop-down menu. Also, feel free to ask me any questions about Groq hardware or service.

link