| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hehdhdjehehegwv 781 days ago
	Funny thing is I’m still in love with Mistral 7B as it absolutely shreds on a nice GPU. For simple tasks it’s totally sufficient.

1 comments

qeternity 781 days ago

Llama3 8B is for all intents and purposes just as fast.

link

minosu 781 days ago

Mistral 7b inferences about 18% faster for me as a 4bit quantized version on an A100. Thats definitely relevant when running anything but chatbots.

link

tmostak 781 days ago

Are you measuring tokens/sec or words per second?

The difference matters as generally in my experience, Llama 3, by virtue of its giant vocabulary, generally tokenizes text with 20-25% less tokens than something like Mistral. So even if its 18% slower in terms of tokens/second, it may, depending on the text content, actually output a given body of text faster.

link