Hacker News new | ask | show | jobs
by ibains 1077 days ago
It is pointless - LlamaIndex and LangChain are re-inventing ETL - why use them when you have robust technology already?

1. You ETL your documents into a vector database - you run this pipeline everyday to keep it up to date. You can run scalable, robust pipelines on Spark for this.

2. You have a streaming inference pipeline that has components that make API calls (agents) and between them transform data. This is Spark streaming.

Prophecy is working with large enterprises to implement generative AI use cases, but they don’t talk so much on HN.

Here’s our talk from Data+AI Summit: Build a Generative AI App on Enterprise Data in 13 Minutes https://www.youtube.com/watch?v=1exLfT-b-GM

Here’s a blog/demo https://www.prophecy.io/blog/prophecy-generative-ai-platform...

1 comments

We also do platform & customer work there (cool pipelines to feed louie.ai or real-time headless versions), and agreed those pipelines have simple uses of LLM where langchain is mostly useful just for a vendor neutrality. Think BYO LLM as it is now a zoo. Basically apache nifi or spark streaming with simple LLM & vector DB call outs. Our harder work here is more at the data engineering level.

But....a lot of our louie.ai work happens for less trivial scenarios where it isn't just the ETL NLP 2.0 tier . That logic is much more complicated, so structured programming abstractions matter a LOT more for AI-style business logic. Think talk to your data and generate on-the-fly analytics pushdown with an interactive data viz UI. That's.. a lot of code.

I agree that it's a little silly, but I mostly use it to abstract over BYO LLMs and extract information from documents. It's nice to be able to quickly prototype something and swap out the underlying language model than set up a whole pipeline with Apache Tika, ETL, etc. Once the idea is feasible, then sure.

That said, langchain is really inefficient and I often find I can re-implement the pieces I need much faster than dealing with langchain's bugs and performance issues.

That’s assuming you’re not using low-code. There are inbuilt connectors to read data, transform data, read/write to pinecone, make api calls to LLMs. It is much faster to prototype with Prophecy.io