|
This is my problem with every end to end system I've seen around this. I find that, even building these systems from scratch, all of the hard parts are just normal data infrastructure problems. The "AI" part takes a small fraction of the effort to deliver even when just building the RAG part directly on top of huggingface/transformers. I also have dealt with what you're describing, but then it goes much farther when going to prod IME. The ingestion part is even more messy in ways these kinds of platforms don't seem to help with. When managing multiple tools in prod with overlapping and non-constant data sources (say, you have two tools that need to both know the price of a product, which can change at any time), I need both of those to be built on the same source of truth and for that source of truth to be fed by our data infra in real time, where relevant documents need to be replaced in real time in more or less an atomic way. Then, I have some tools that have varying levels of permissioning on those overlapping data sources, say, you have two tools that exist in a classroom, one that helps the student based on their work, and another that is used by the TA or teacher to help understand students' answers in a large course. They have overlapping data needs on otherwise private data, and this kind of permissioning layer which is pretty trivial in a normal webapp has, IME, had to have been implemented basically from scratch on top of the vector db and retrieval system. Then experimentation, eval, testing, and releases are the hardest and most underserved. It was only relatively recently that it seemed like anyone even seemed to be talking about eval as a problem to aspire to solve. There's a pretty interesting and novel interplay of the problems of production ML eval, but with potentially sparse data, and conventional unit testing. This is the area we had to put the most of our own thought into for me to feel reasonably confident in putting anything into prod. FWIW we just built our own internal platform on top of langchain a while back, seemed like a good balance of the right level of abstraction for our use cases, solid productivity gains from shared effort. I think this is a really interesting problem space, but yeah, I'm skeptical of all of these platforms as they seem to always be promising a lot more than they're delivering. It looks superficially like there has been all of this progress on tooling, but I built a production service based on vector search in 2018 and it really isn't that much easier today. It works better because the models are so much better, but the tools and frameworks don't help that much with the hard parts, to my surprise honestly. Perhaps I'm just not the user and am being excessively critical, but I keep having to deal with execs and product people throwing these frameworks at us internally without understanding the alignment between what is hard about building these kinds of services in prod and what these kinds of tools make easier vs harder. |
The infra challenges are real - it has been what I have been struggling the most with in providing high quality support for early users. Most want to be able to reliably firehose 10-100s of GBs of data through a brittle multistep pipeline. This was something I struggled with when building AgentSearch [https://huggingface.co/datasets/SciPhi/AgentSearch-V1] with LOCAL data - so introducing the networking component only makes things that much harder.
I think we have a lot of work to do to robustly solve this problem, but I'm confident that there is an opportunity to build a framework that results in net positives for the developer.
FWIW, Your feedback would be invaluable as the project continues to grow.