Hacker News new | ask | show | jobs
by kristianp 792 days ago
> The bigger size is probably from the bigger vocabulary in the tokenizer.

How does that affect anything? It still uses 16 bit floats in the model doesn't it?