Y
Hacker News
new
|
ask
|
show
|
jobs
by
rileyphone
787 days ago
The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw.
1 comments
kristianp
786 days ago
> The bigger size is probably from the bigger vocabulary in the tokenizer.
How does that affect anything? It still uses 16 bit floats in the model doesn't it?
link
How does that affect anything? It still uses 16 bit floats in the model doesn't it?