|
|
|
|
|
by jasonjmcghee
1260 days ago
|
|
I haven't seen this to be the case, fwiw. There was a paper in 2016 that did this and most were in the ~40% range. But "any ml algorithm" isn't the point. It's a new optimization technique and should be applied to models/architectures that make sense with the problems they are being used on. For example, they could have used a pretrained featurizer and trained the two layer model on top of it, with both back prop and FF and compared. |
|
Making the assumption that weights/embeddings produced by a backprop-trained network are equally intelligible to a network also trained by backprop vs. one trained by this alternative method.