Hacker News new | ask | show | jobs
by t-vi 1021 days ago
> Is avoiding CF potentially just a matter of sheer scale ?

My intuition would be that you get more orthogonal directions to the gradient (of previous samples) if you have larger model.