|
|
|
|
|
by appenz
1148 days ago
|
|
Summarization is much more expensive than vector db's. Assume you have 1m tokens of context. You could run all through GPT-4 and summarize the information, but it would cost $60 (based on current prices) and take 10's of minutes of GPU time to do the inference. Disclaimer: I work for a16z and on the infra team, so consider me biassed. |
|
As for a corpus of documents (which is what you are presumably talking about), there are a couple problems with what you are saying:
First, you are implying that the content is always new - that's not true for many cases folks are talking about solving (like technical support or customer support), so it's a one time fee to summarize the corpus. You might run it periodically for updates.
Second, there is an assumption that a basic semantic search is the best way to search documents to find the most relevant content. That's questionable before the existence of LLMs, but with LLMs you are basically assuming your cosine similarity search on your vectors is better than an LLM can do with a simple table of contents and question "where should I search?" I haven't seen someone do a detailed study, but the implicit assumption that semantic search is the best idea for text could easily be a bad one.
Third, it assumes the quantum of data to search through is astronomically large and/or getting bigger compared to almost certain decreases in inference cost and increases in input tokens. This will be true for some subset of things, but unlikely to be many and in the cases it is true they'll do something more sophisticated than embeddings and embedding search. They'll probably fine tune the underlying model on an ongoing basis.
Regardless - the post you guys wrote seems... like a stretch for a definition of what this really is And, at least on the surface vector databases appear to be commodity infra. Pinecone might be growing fast now, but how do they ever make much money above their costs? But, you guys seem smart, so maybe there is something there?