Hacker News new | ask | show | jobs
by svcrunch 1715 days ago
> An example in NLP world is BERT-like NNs, that allow you to embed your text into a dense vector representation.

I might say transformer-based NNs instead. The problem with cross-attentional models like BERT is that they won't scale to large datasets. They are more often used in reranking results within an IR pipeline. However, even for that use they require distillation.