Y
Hacker News
new
|
ask
|
show
|
jobs
by
qoez
341 days ago
It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.