|
|
|
|
|
by Cybiote
1896 days ago
|
|
Transformers, which are currently waging a successful campaign to conquer all Deep Learning, are largely stacked feed-forward networks, matrix multiplies and maps. Some ideas to make attention more scalable, such as LSH or large sparse attention matrices seem like they'd be well suited to this approach. Their approach should also be readily adaptable to RNNs, including LTSMs. Certainly worth investigating as an alternative for efficiently running and training giant networks on less expensive hardware. |
|