Hacker News new | ask | show | jobs
by cheekygeeky 29 days ago
> He removed MLP from Qwen and the model still could do transformation tasks on input but lost knowledge.

But not deterministic?