Hacker News new | ask | show | jobs
by jasonjmcghee 1260 days ago
I haven't seen this to be the case, fwiw. There was a paper in 2016 that did this and most were in the ~40% range.

But "any ml algorithm" isn't the point. It's a new optimization technique and should be applied to models/architectures that make sense with the problems they are being used on.

For example, they could have used a pretrained featurizer and trained the two layer model on top of it, with both back prop and FF and compared.

1 comments

> For example, they could have used a pretrained featurizer and trained the two layer model on top of it, with both back prop and FF and compared.

Making the assumption that weights/embeddings produced by a backprop-trained network are equally intelligible to a network also trained by backprop vs. one trained by this alternative method.

I have personally seen them used successfully with all kinds of classic ml algorithms (enets, tree-based, etc) that have nothing to do with back prop.