Y
Hacker News
new
|
ask
|
show
|
jobs
by
matsemann
1179 days ago
Maybe lots of the data is embedding values or tokenizer stuff, where a single prompt uses a fraction of those values. And then the rest of the model is quite small.
1 comments
w1nk
1179 days ago
That shouldn't be the case. 30B is a number that directly represents the size of the model, not the size of the other components.
link