Hacker News new | ask | show | jobs
by zozbot234 87 days ago
The ~82B figure is an attempt to compare performance to an equivalent dense model. The amount of active parameters is given by the ~17B.