|
|
|
|
|
by ivanstegic
1112 days ago
|
|
This is a great idea and would love to see something like this succeed! If I understand how all of these OpenAI dependent apps work, none of them actually have the LLM and are doing any kind of heavy processing. AFAIK, they’re all packaging your data, submitting it to OpenAI on every request and then repackaging the output. There’s no real indexing, no real tangible thing, you have to start from scratch every time. So it’s likely going to be very expensive and super slow. Or am I wrong and I’ve missed something here? |
|
I think the most common design pattern nowadays goes like this:
1. Chunk all your data (e.g. per paragraph of content)
2. Generate an embedding for each chunk
3. Index embeddings in a vector database
4. When a query comes in, find chunks relevant to the query (based on embeddings similarity) and ONLY send the relevant chunks + query to a LLM to formulate the answer
Quickly glancing through the repository from this post, I can see that it also follows this pattern. It uses OpenAI's embedding API for 2. and Pinecone DB for 3.