Hacker News new | ask | show | jobs
by stingraycharles 84 days ago
Ok I am by no means an expert on this and I immediately stand corrected. But as I understand it, in order to understand the amount of active memory that’s required, it’s more accurate to go by the ~82B number, right?
1 comments

The ~82B figure is an attempt to compare performance to an equivalent dense model. The amount of active parameters is given by the ~17B.