Hacker News new | ask | show | jobs
by numeri 930 days ago
You're not necessarily wrong, but I'd imagine this is almost prohibitively slow. Also, this model seems to use two experts per token.