|
|
|
|
|
by peterstjohn
878 days ago
|
|
I no longer work there, but Lucidworks has had embedding training as a first-class feature in Fusion since January 2020 (I know because I wrapped up adding it just as COVID became a thing). We definitely saw that even with just slightly out-of-band use of language - e.g. in e-commerce, things like "RD TSHRT XS", embedding search with open (and closed) models would fall below bog-standard* BM25 lexical search. Once you trained a model, performance would kick up above lexical search…and if you combined lexical _and_ vector search, things were great. Also, a member on our team developed an amazing RNN-based model that still today beats the pants off most embedding models when it comes to speed, and is no slouch on CPU either… (* I'm being harsh on BM25 - it is a baseline that people often forget in vector search, but it can be a tough one to beat at times) |
|