|
|
|
|
|
by reissbaker
1022 days ago
|
|
Oh, I wasn't suggesting using a vector DB. Personally I just iterate through the corpus and check cosine similarity with a for loop. If by "quantized e5 model small enough to fit in a serverless function" you mean e5-small-v2, FYI it actually underperforms just calling OpenAI for embeddings (text-embedding-ada-002) on the HuggingFace MTEB benchmarks. And that definitely doesn't negate using a doc2query-style approach to preprocess the documents before running them through the pretrained embedding model if you're comparing e.g. questions to answers, rather than raw document-to-document similarity. (Of course a custom trained model will be more efficient! In fact, the original doc2query paper in 2019 used a custom trained model for step 1, as did many enhancements on it e.g. doc-t5-query. What's neat is that with the advent of really good pretrained LLMs, you can get results approximating that without training your own models in like ~5mins of work.) |
|
Considering the LLM is still doing the final pass, and the latency from the LLM is based on output length, I find the UX to be significantly improved just doing reranking in-process.
I think there's been a bit of whiplash, where people went from gatekeeping "hard ML", to "I can shove this all at a REST API", but there's a golden path laying in between for use-cases where UX matters.
I even fall back to old school NLP (like ML-less, glorified wordlist POS taggers) for LLM tasks and end up with significantly improved performance for almost 0 additional effort