|
|
|
|
|
by cmenge
253 days ago
|
|
We're processing tenders for the construction industry - this comes with a 'free' bucket sort from the start, namely that people practically always operate only on a single tender. Still, that single tender can be on the order of a billion tokens. Even if the LLM supported that insane context window, it's roughly 4GB that need to be moved and with current LLM prices, inference would be thousands of dollars. I detailed this a bit more at https://www.tenderstrike.com/en/blog/billion-token-tender-ra... And that's just one (though granted, a very large) tender. For the corpus of a larger company, you'd probably be looking at trillions of tokens. While I agree that delivering tiny, chopped up parts of context to the LLM might not be a good strategy anymore, sending thousands of ultimately irrelevant pages isn't either, and embeddings definitely give you a much superior search experience compared to (only) classic BM25 text search. |
|
Embeddings had some context size limitations in our case - we were looking at large technical manuals. Gemini was the first to have a 1m context window, but for some reason its embedding window is tiny. I suspect the embeddings might start to break down when there's too much information.