Hacker News new | ask | show | jobs
by matsemann 1179 days ago
Maybe lots of the data is embedding values or tokenizer stuff, where a single prompt uses a fraction of those values. And then the rest of the model is quite small.
1 comments

That shouldn't be the case. 30B is a number that directly represents the size of the model, not the size of the other components.