Hacker News new | ask | show | jobs
by Filligree 99 days ago
They serve it about 2x slower. So it must have about 2x the active parameters.

It could still be 10x larger overall, though that would not make it 10x more expensive.

1 comments

Yes, but I highly doubt they would increase sparsity much vs the chinese models.

That's how you get Llama 4.

Pretty much every major lab settled on ~3-5% sparsity for a reason.