| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andreinwald 319 days ago
	It works on small Llama-3.2-1B model, specially for less powerfull GPU devices

1 comments

The answer is still terrible for the model size. Maybe it's the 4 bit quantization, smaller models tend to react worse to that

For reference, [1] is what stock quen3-0.6B would answer. Not a perfect answer, but much better at nearly half the number of parameters

It's likely the quantization on "Llama-3.2-1B-Instruct-q4f16_1-MLC". inference.net generated this more coherent answer: https://hst.sh/ovilewofox.md