Hacker News new | ask | show | jobs
by Simon321 790 days ago
It's a mixture of experts model. Only a small part of those parameters are active at any given time. I believe it's 16x110B