Hacker News new | ask | show | jobs
by ramoz 1778 days ago
I've scaled large transformer based models that supplement a lucene-based search engine. The architecture supports an ensemble approach where Lucene results are first-class and then we tailor similarity rankings with the models.

It looks a lot like this: https://huggingface.co/blog/bert-cpu-scaling-part-1

We have to store large "index" embeddings on SSDs and use leveldb for value retrievals of the lucene results.