Hacker News new | ask | show | jobs
by tacoman 873 days ago
I am using the exact same model. Ryzen 5600G w/32GB and an Nvidia P40 w/24GB VRAM

20/33 layers offloaded to GPU, 4K context. Uses 25GB system RAM and all 24GB VRAM. 5-7 tokens per second.

2 comments

Context is set to 32768, I didn't change it I guess that's the model's default.

Thanks for making me feel better about investing in tht motherboard + CPU + RAM upgrade and deferring the GPU upgrade.

and Groq does 485.08 T/s on mixtral 8x7B-32k

I am not sure local models have any future other than POC/research. Depends on the cost of course.

(Groqster here) For anyone who wants to try it, you can go to https://chat.groq.com/ and choose Mixtral from the drop-down menu. Also, feel free to ask me any questions about Groq hardware or service.