|
|
|
|
|
by overlords
2341 days ago
|
|
Vowpal Wabbit has been doing this 'hashing trick' since the 200s. It also the feature interaction, which are the same thing as a layer in transformers (all against all matrix). So it seems like they are still catching up to where John Langford and crew were over a decade ago. And, the vowpal wabbit approach is extremely fast to train because it's only doing stochastic gradient descent on a linear function - linear regression. Transformers are much slower to train. EDIT: Downvoters, please see my last leaf to see why they're effectively the same. The guy responding here seems unfamiliar with all the functionality of vowpal wabbit. |
|
The VW hashing trick is about hashing your input data (ie: words, fields, etc.) into an array to lower storage requirements and deal with novel data at run time.
The google paper is about ordering the intermediate states of the neural network (ie: vectors) while preserving distance. This is done so you can chunk the resulting ordered list and perform computations on individual chunks (and their neighbors).
The only thing in common I see is the fact they both use the word hashing.