Hacker News new | ask | show | jobs
by riter 1133 days ago
Not off-topic at all. After struggling with LangChain's hyper-opinionated implementation of classes I agree.

In fact, this is better off leveraging Llamaindex. This is a proof-of-concept and ultimately leveraging a library / framework helps afford the following:

- easy implementation of chunking strategies when you're unsure - OpenAI helper functions - embeddings and vector store management

Again, even with the above I struggled and had to implement PGVector myself. Going into production once I have my document retrieval strategy and prompt-tuning optimized, I would never use Langchain in production simply bc of the bloat and inflexible implementation of things like the PGVector class. Also the footprint is massive and the LLM part can be done in 5% of the footprint in Golang and 5% of the cloud costs.

So I actually agree with you :)

4 comments

Thanks for the insights.

I wonder if one needs even LlamaIndex?

From their site:

>Storing context in an easy-to-access format for prompt insertion.

>Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.

>Dealing with text splitting.

Not sure if it isn't easier to roll one's own for that...?

I know a thing or two about the math behind LLMs and all this software build around a few core ideas just seems to be a lot of overkill...

When mentioning about PGVevtor, did you refer to this repo or is there a class within LangChain that has the same name? https://github.com/pgvector/pgvector

You’re almost certainly going to have to write your own splitting code for anything nontrivial. LlamaIndex breaks down hard when there’s a lot of markup in the document, for example. You’ll also want control over the vector search strategy (just using the query or chunk embedding may not be enough)
in terms of search store and engine, would you agree that pgvector is sufficient for most text-specific cases?
I agree. I mentioned in a thread below that these frameworks are useful for discovering appropriate index-retrieval strategy that works best for you product.

On PGVector, I tried to use LangChains class (https://python.langchain.com/en/latest/modules/indexes/vecto...) but it was highly opinionated and it didn't make sense to subclass nor implement interfaces so in this particular project I did it myself.

As part of implementing with SQLModel I absolutely leaned on https://github.com/pgvector/pgvector :)

Thanks for the observation.

FWIW, individual classes are generally tiny, so we found using langchain is fine and then for places we need to beef up (chunking, not calling 'eval', ...), we do our own class/subclass. That way we can align with community for broader pieces and patterns, and decrease technical risks from smaller fly-by-night repos.

At the same time, the underlying APIs are super simple, so just rolling your own entirely, with no framework, can make sense. We need to deal with businesses wanting to plug in their own APIs & models, so that happens to be less attractive to us.

That said, purpose built frameworks can be great. Our data agent has a headless tier and we are building it fine with langchain, and benefiting from the ecosystem there, but I can imagine someone with more specific needs enjoying rasa..

Splitting things is easy! Store the dense vectors of 512 characters or so and use an overlayed index of terms to set context of the current conversation.

Use Weaviate Cloud for the vector engine…

Ignoring footprint and bloat, the big problem you identify is inflexible class design. I wonder why it happened? Is it hard for langchain to expose all the desired features of a tool like PGVector via its own class?
Someone needs to create a “Langchain, but less complicated” framework
I sorta did this, feel free to check it out and let me know your thoughts!

On the main langchain post (In January) that got the traction on hackernews, i left this comment: https://news.ycombinator.com/item?id=34422917 . It still remains true, a "simpler langchain"

> To offer this code-style interface on top of LLMs, I made something similar to LangChain, but scoped what i made to only focus on the bare functional interface and the concept of a "prompt function", and leave the power of the "execution flow" up to the language interpreter itself (in this case python) so the user can make anything with it.

I made a really lightweight wrapper over requests and call it lambdaprompt https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch

Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.

All together its only ~600 lines of python.

I haven't had a chance to really push all the different examples out there, so I think it hasn't seen much adoption outside of those that give it a try.

I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)

The use-cases and tooling around language models is very premature. So, any framework you build now will either look like bloatware or will remain close to just calling an API.

The dust around language models needs to settle a bit, for a useful framework to emerge from it.

For our own use-cases, I built a framework from scratch, and it was the best decision we made.

> For our own use-cases, I built a framework from scratch, and it was the best decision we made.

My thinking precisely. So you just used the "raw" OpenAI (I presume?) API, and no other tech on top?

Exactly. The most important part was working with Jinja templating. So, openai + jinja2.
very much agreed re: dust settling.

it makes no sense deploying any of these libraries to prod. as-is. best to understand a configuration / workflow / tuning / etc. that fits your data best and write it from scratch in golang/rust/whatever.

Are these computationally expensive operations? If not, Elixir could fit.
They are not all computationally expensive. The rate limiting step here is the LLM call itself over the API. So, async is definitely needed. The other aspects would be loading the template from filesystem. I would assume this could be something that's needs to be optimized in the application.
This recently via DataMachina substack:

https://blog.scottlogic.com/2023/05/04/langchain-mini.html

thanks for the share, will check out
lololol. i think this opportunity gets bigger post $10m seed round. they'll likely double down and expand footprint vs the inverse.

check out llama-index. its purpose-built for document indexing and retrieval and less agents and "everything else"

That's pretty wild, I've been setting things up like this for about 5 years with just BERT or my own fine tuned encoder only systems. It should be done for free, not millions... Can I get millions for running `ls` too?
What do you by post 10m seed round?

Do you mean if LlamaIndex starts collecting VC? I'm not sure, are they for-profit?...

I was referring to Langchain who raised $10mm from Benchmark

https://blog.langchain.dev/announcing-our-10m-seed-round-led...

I'm fairly Jerry Liu (LlamaIndex founder) already has angels or will see enough traction to warrant a seed.

But this is a turn-key llm, that is built on langchain? A user doesn't need to dig into langchain themselves, right?
To be clear (apologies if I haven't made it so) this is not an LLM. This is an implementation of Rasa leveraging Langchain under the hood.

A user technically does not need to dig into Langchain themselves, but they would want to if they find their query results sub-optimal.

There are a many indexing strategies and superficial parameters you could modify to tune output response. They are mentioned in the README.md.