Hacker News new | ask | show | jobs
by qoez 341 days ago
It's basically a mixture of experts but instead of a learned operator picking the predicted best model, you use a 'max' operator across all experts.