Hacker News new | ask | show | jobs
by krishadi 986 days ago
Latency from embedding models is still going to be the bottleneck for performance however fast the DB is going to be. Plus adding all the overhead of synthesising answers and summaries from a LLM is going to weigh you down.
1 comments

Embeddings can be precomputed. Imagine a related videos section a video sharing site. Each video's embedding is relatively static.
If you are building a search engine or a QA bot, the embedding of the query still needs to be calculated. The results do depend on the quality of the model, and if you are using a large on it does take time.