| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yvdriess 2338 days ago

The reverse is true, embeddings are both the performance and memory-footprint bottleneck of modern NN models.

Check figure 6. of : https://arxiv.org/pdf/1906.00091.pdf

Embeddings are used to lookup sparse features, so you have those pesky data-dependent lookups.

2 comments

zeroxfe 2338 days ago

> The reverse is true, embeddings are both the performance and memory-footprint bottleneck of modern NN models.

They may be a bottleneck, but the alternative is worse -- you can't fit complex models with large vocabularies into GPU memory using sparse one-hot encodings.

link

yvdriess 2338 days ago

Surely you mean dense one-hot?

Technically, the sparse one-hot encoding is the most efficient in terms of memory footprint. You simply store the non-zero coordinates.

The problem in practice for GPUs is that sparse vector/matrix operations are too inefficient.

The whole point of something like this paper is to skip the entire 'densification' step and to directly deal with the sparse matrix input as a sparse matrix. The LSH is used in this paper improves on directly using SpMSpV, as that is also inefficient on CPUs, although to a lesser extent than GPUs.

link

thesz 2338 days ago

No, you can successfully fit complex models if you use byte-pair or similar encodings (morphessor [1] comes to mind).

[1] https://morfessor.readthedocs.io/en/latest/

You also will get much more meaningful embeddings from summing embeddings of part of the word.

link

Der_Einzige 2338 days ago

Only a bad bottleneck because proper database techniques aren't being used widely for embeddings yet within ML pipelines

See libraries like magnitude for proper embedding lookup implementations

link