Hacker News new | ask | show | jobs
by jychang 98 days ago
Yes, but I highly doubt they would increase sparsity much vs the chinese models.

That's how you get Llama 4.

Pretty much every major lab settled on ~3-5% sparsity for a reason.