Hacker News new | ask | show | jobs
by ekleraki 1250 days ago
It is slower to the same backprop that we have used for decades now.

No comparisons to AdamW were made.

In fact, this algorithm uses backprop at its core, but propagating through 0 layers.