| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rileyphone 787 days ago
	The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw.

1 comments

> The bigger size is probably from the bigger vocabulary in the tokenizer.

How does that affect anything? It still uses 16 bit floats in the model doesn't it?