| HN Mirror

The rumor is that it's a mixture of experts model, which can't be compared directly on parameter count like this because most weights are unused by most inference passes. (So, it's possible that 400B non-MoE is the same approximate "strength" as 1.8T MoE in general.)