Hacker News new | ask | show | jobs
by code_runner 1230 days ago
I’ll do my best.

Number of params is the number of weights. Basically the number of learnable variables.

Number of tokens is how many tokens it saw during training.

Vocab size is the number of distinct tokens.

The relationship between params/tokens/compute power is something people have studied a good deal and how it affects model performance. https://arxiv.org/pdf/2203.15556.pdf