|
|
|
|
|
by whimsicalism
1260 days ago
|
|
> For example, they could have used a pretrained featurizer and trained the two layer model on top of it, with both back prop and FF and compared. Making the assumption that weights/embeddings produced by a backprop-trained network are equally intelligible to a network also trained by backprop vs. one trained by this alternative method. |
|