Y
Hacker News
new
|
ask
|
show
|
jobs
by
whimsicalism
502 days ago
It probably also has to do with their internal infra. If it were just about dense models being easier for the OSS community to use & build on, they should probably be training MoEs and then distilling to dense.