Y
Hacker News
new
|
ask
|
show
|
jobs
by
jychang
98 days ago
Yes, but I highly doubt they would increase sparsity much vs the chinese models.
That's how you get Llama 4.
Pretty much every major lab settled on ~3-5% sparsity for a reason.