|
|
|
|
|
by ramoz
941 days ago
|
|
Very bearish on these frameworks and abstractions. Yes, obviously useful for prototyping and creating hype articles & tweets with fun examples. However any engineer is capable of doing their own rag with the same effort (minimal data extraction using the ancient pdf/scrape tools that are still open sota, or use cloud ocr for best —-> brute force chunking —-> embed —-> load in Ann with complementary metadata store) Anyone doing prod needs to know the intricacies and make advanced engineering decisions. There’s a reason there aren’t similar end-to-end abstractions over creating Lucene (solr/elastic) indexes. Hmm, why not after many decades? … In reality, the RAG tech is not entirely novel— it’s etl. Which in reality, complex etl is often a serious data curation effort. LLMs are the closest thing to enabling better data curation, and as long as you aren’t competing with open ai (arguably any commercial system is) then you can use chatgpt to create your chunks. Beyond this embedding strategies are nice to abstract but the best approach to embeddings still remains to create your own and figure out contextual integration on your own. Creating your own can also just be fine-tuning. Inference is often an ensemble depending on your use case. |
|
Probably the main point I disagree with you is that RAG is just ETL. If that was the case, all of the AI apps people are building would be AMAZING because we solved the ETL problem years ago. Yet, app after app being released have issues like hallucinations and incorrect data. IMO the second you insert a non-deterministic entity in the middle of an ETL pipeline, it is no longer just ETL. To try to add value here, our focus has been on adding capabilities to the framework around data synchronization (which is actually more of a vector management problem), contextualization of data through metadata and retrieval (this part being were we have spent the least time to date, but are currently spending the most)