Y
Hacker News
new
|
ask
|
show
|
jobs
by
numeri
930 days ago
You're not necessarily wrong, but I'd imagine this is almost prohibitively slow. Also, this model seems to use two experts per token.