Hacker News new | ask | show | jobs
by oliveiracwb 118 days ago
With the advent of MoEs, efficiency gains became possible. However, MoEs still operate far from the balance and stability of dense models. My view is that most progress comes from router tuning based on good and bad outcomes, with only marginal gains in real intelligence