Hacker News new | ask | show | jobs
by tommiegannert 607 days ago
Plus the 2^-l(m) correction term.

Feels like multiplication shouldn't be needed for convergence, just monotonicity? I wonder how well it would perform if the model was actually trained the same way.