Y
Hacker News
new
|
ask
|
show
|
jobs
by
yababa_y
5 days ago
separately trained experts can surpass performance in their activated regime and DOES result in a smarter model, the Claude system cards talk about this and eg there is
https://openreview.net/forum?id=iydmH9boLb
to read...