Hacker News new | ask | show | jobs
by tansey 3653 days ago
Just read (skimmed) this paper yesterday actually.

Looks interesting-- but there are no timing graphs! It's kind of a strawman argument to say "We can't use Newton's method because it's too slow to calculate the Hessian," and then go and present all your performance graphs in terms of number of iterations.

1 comments

Usually not providing time measurements means that each iteration is extravagantly expensive and the authors didn't find test cases with good actual performance, but in this case there seems to be the major twist of completely hiding away the optimizer training cost.

To be fair, it should be noted that there are no claims of actual good performance, only claims that the technology works: "Our experiments have confirmed that learned neural optimizers compare favorably against state-of-the-art optimization methods used in deep learning."