Hacker News new | ask | show | jobs
by data_maan 1141 days ago
I'm not sure what the advantage the use of a somewhat comprehensive framework like Langchain gives you for this use case?

It starts to feel as AI tech is slowly turning into web tech with a million tools and frameworks, so I'm just wondering whether all of these are needed and if it isn't easier to code your own than learning a foreign framework...

4 comments

Not off-topic at all. After struggling with LangChain's hyper-opinionated implementation of classes I agree.

In fact, this is better off leveraging Llamaindex. This is a proof-of-concept and ultimately leveraging a library / framework helps afford the following:

- easy implementation of chunking strategies when you're unsure - OpenAI helper functions - embeddings and vector store management

Again, even with the above I struggled and had to implement PGVector myself. Going into production once I have my document retrieval strategy and prompt-tuning optimized, I would never use Langchain in production simply bc of the bloat and inflexible implementation of things like the PGVector class. Also the footprint is massive and the LLM part can be done in 5% of the footprint in Golang and 5% of the cloud costs.

So I actually agree with you :)

Thanks for the insights.

I wonder if one needs even LlamaIndex?

From their site:

>Storing context in an easy-to-access format for prompt insertion.

>Dealing with prompt limitations (e.g. 4096 tokens for Davinci) when context is too big.

>Dealing with text splitting.

Not sure if it isn't easier to roll one's own for that...?

I know a thing or two about the math behind LLMs and all this software build around a few core ideas just seems to be a lot of overkill...

When mentioning about PGVevtor, did you refer to this repo or is there a class within LangChain that has the same name? https://github.com/pgvector/pgvector

You’re almost certainly going to have to write your own splitting code for anything nontrivial. LlamaIndex breaks down hard when there’s a lot of markup in the document, for example. You’ll also want control over the vector search strategy (just using the query or chunk embedding may not be enough)
in terms of search store and engine, would you agree that pgvector is sufficient for most text-specific cases?
I agree. I mentioned in a thread below that these frameworks are useful for discovering appropriate index-retrieval strategy that works best for you product.

On PGVector, I tried to use LangChains class (https://python.langchain.com/en/latest/modules/indexes/vecto...) but it was highly opinionated and it didn't make sense to subclass nor implement interfaces so in this particular project I did it myself.

As part of implementing with SQLModel I absolutely leaned on https://github.com/pgvector/pgvector :)

Thanks for the observation.

FWIW, individual classes are generally tiny, so we found using langchain is fine and then for places we need to beef up (chunking, not calling 'eval', ...), we do our own class/subclass. That way we can align with community for broader pieces and patterns, and decrease technical risks from smaller fly-by-night repos.

At the same time, the underlying APIs are super simple, so just rolling your own entirely, with no framework, can make sense. We need to deal with businesses wanting to plug in their own APIs & models, so that happens to be less attractive to us.

That said, purpose built frameworks can be great. Our data agent has a headless tier and we are building it fine with langchain, and benefiting from the ecosystem there, but I can imagine someone with more specific needs enjoying rasa..

Splitting things is easy! Store the dense vectors of 512 characters or so and use an overlayed index of terms to set context of the current conversation.

Use Weaviate Cloud for the vector engine…

Ignoring footprint and bloat, the big problem you identify is inflexible class design. I wonder why it happened? Is it hard for langchain to expose all the desired features of a tool like PGVector via its own class?
Someone needs to create a “Langchain, but less complicated” framework
I sorta did this, feel free to check it out and let me know your thoughts!

On the main langchain post (In January) that got the traction on hackernews, i left this comment: https://news.ycombinator.com/item?id=34422917 . It still remains true, a "simpler langchain"

> To offer this code-style interface on top of LLMs, I made something similar to LangChain, but scoped what i made to only focus on the bare functional interface and the concept of a "prompt function", and leave the power of the "execution flow" up to the language interpreter itself (in this case python) so the user can make anything with it.

I made a really lightweight wrapper over requests and call it lambdaprompt https://github.com/approximatelabs/lambdaprompt It has served all of my personal use-cases since making it, including powering `sketch` (copilot for pandas) https://github.com/approximatelabs/sketch

Core things it does: Uses jinja templates, does sync and async, and most importantly treats LLM completion endpoints as "function calls", which you can compose and build structures around just with simple python. I also combined it with fastapi so you can just serve up any templates you want directly as rest endpoints. It also offers callback hooks so you can log & trace execution graphs.

All together its only ~600 lines of python.

I haven't had a chance to really push all the different examples out there, so I think it hasn't seen much adoption outside of those that give it a try.

I hope to get back to it sometime in the next week to introduce local-mode (eg. all the open source smaller models are now available, I want to make those first-class)

The use-cases and tooling around language models is very premature. So, any framework you build now will either look like bloatware or will remain close to just calling an API.

The dust around language models needs to settle a bit, for a useful framework to emerge from it.

For our own use-cases, I built a framework from scratch, and it was the best decision we made.

> For our own use-cases, I built a framework from scratch, and it was the best decision we made.

My thinking precisely. So you just used the "raw" OpenAI (I presume?) API, and no other tech on top?

Exactly. The most important part was working with Jinja templating. So, openai + jinja2.
very much agreed re: dust settling.

it makes no sense deploying any of these libraries to prod. as-is. best to understand a configuration / workflow / tuning / etc. that fits your data best and write it from scratch in golang/rust/whatever.

Are these computationally expensive operations? If not, Elixir could fit.
They are not all computationally expensive. The rate limiting step here is the LLM call itself over the API. So, async is definitely needed. The other aspects would be loading the template from filesystem. I would assume this could be something that's needs to be optimized in the application.
This recently via DataMachina substack:

https://blog.scottlogic.com/2023/05/04/langchain-mini.html

thanks for the share, will check out
lololol. i think this opportunity gets bigger post $10m seed round. they'll likely double down and expand footprint vs the inverse.

check out llama-index. its purpose-built for document indexing and retrieval and less agents and "everything else"

That's pretty wild, I've been setting things up like this for about 5 years with just BERT or my own fine tuned encoder only systems. It should be done for free, not millions... Can I get millions for running `ls` too?
What do you by post 10m seed round?

Do you mean if LlamaIndex starts collecting VC? I'm not sure, are they for-profit?...

I was referring to Langchain who raised $10mm from Benchmark

https://blog.langchain.dev/announcing-our-10m-seed-round-led...

I'm fairly Jerry Liu (LlamaIndex founder) already has angels or will see enough traction to warrant a seed.

But this is a turn-key llm, that is built on langchain? A user doesn't need to dig into langchain themselves, right?
To be clear (apologies if I haven't made it so) this is not an LLM. This is an implementation of Rasa leveraging Langchain under the hood.

A user technically does not need to dig into Langchain themselves, but they would want to if they find their query results sub-optimal.

There are a many indexing strategies and superficial parameters you could modify to tune output response. They are mentioned in the README.md.

LMQL (language model query language) is a different take on prompting, and I find it less restrictive and more intuitive. Langchain is to LMQL what Keras is to Tensorflow

https://lmql.ai/

Thanks for the link. I skimmed the docs and couldn't find a motivation section. Can you expand on how you find it less restrictive and more intuitive.

My first impression is that this is a paradigm mismatch and an 'API' masquerading as a "language". LMQApi? Looks fine, and we have all the necessary ports for (query, model, []constraints, ...).

So what's the language bit? It's the 'scripted prompt'. That's the only bit that is reasonably a 'language', but as a language it is all over the place. Semantics are rather wild, don't you agree?

    sample(temperature=0.8)
       "A list of things not to forget when going to the sea (not travelling): \n"
       backpack = []
       for i in range(5):
          "-[THING]"
          backpack.append(THING.strip())
       print(backpack)
    from
       'openai/text-ada-001'
    where
       STOPS_AT(THING, "\n")
This part reminds of shell scripting (and what I hate about it). For example, what are the semantics of > "something quoted" < in this language? How about "THING" and THING? Is that a token, a variable, or both?

So, we really have an 'imperative' language part (the scripted prompt) and then a pretense at "declarative language" with the elaborated api call spelled out as a sqlish query.

p.s. I appreciate and laud the effort of the team which produced this. This is just feedback.

Yeah the semantics are very weird, but I guess “prompt engineering” is weird too, so it makes sense :) .

Everything between “sample” and “from” is basically a script that generates a prompt, which is incrementally fed to the LM.

Each line contained in double quotes will get appended to the prompt, using an f-string syntax, like normal LM templates. So if you have a local python variable “foo”, you can say “how do I make {foo}?” and it will substitute its value into the prompt (not interesting).

But things in square brackets are called “hole variables”, and do the opposite. If you follow up the previous with the line “you make it by [instructions]” , the prompt up to that point is passed to the LM, and the hole in the prompt is filled, and the result is stored in a local variable “instructions” which you can reference later on in the prompt, or in python script.

Any lines in between that don’t have double quotes are interpreted as python. So you can make program logic and LM calls conditional on the result of previous LM calls, or other results of some other process. So for example you could build a critique loop like the critique chain in the LC docs out of an actual while loop, where the while loop breaks when the LM determines the output is acceptable.

The exact same thing is possible with LangChain already, but it would involve creating templates, instantiating chains, etc, which isn’t bad, but adds complexity. In LMQL syntax, you can glance at the program and plainly see what it does using your programming brain… “yeah this while loop breaks when the screenplay is good enough, and the refined version gets returned” whereas I think LC’s abstractions make something simple like this look complex.

The “where” clause is where you specify constraints, which allow you to limit what the value of a hole can be. In this case you could apply a “where” constraint to a hole variable [rating] that forces it to be either “good enough” or “needs improvement”, and nothing else can possibly be sampled from the token distribution. This makes pipelines a lot more efficient by eliminating the need for “correction chains” in a lot of places. Also, once the tokens “ne” or “go” have been generated, LMQL doesn’t have to request any more tokens because the result is already uniquely determined, and it can substitute the rest and move on.

The other thing that I love about LMQL is that everything is async. Last time I tried, maybe two months ago, making a LC chain asynchronous didn’t feel natural. In my use cases, chains were async more often than not and it was kind of annoying.

In fact under the hood, the LMQL query is compiled to a decorated async function. So at the end of the day, you can use any of your queries as simple async functions. If you want to make react Agents, or any other LM abstraction you like, you pretty much just have to stick a few @lmql.query decorated functions inside a class definition and you’re good to go. That’s what I meant by the Tensorflow/keras analogy.

LMQL still isn’t mature and there’s a lot on the roadmap. Prompting is a wild west, and altogether we haven’t even discovered a lot of the problems we will need to solve. I like to think the situation is like how I imagine operating systems and a lot of software in general looked before Bell labs. For now at least, I think of all the options, LMQL is closest to the golden path.

Let me know if you have any more questions, feel free to send an email!

Very informative, thank you. You make a strong case for it. Interesting how in QL the query plan is kinda spelled out in the select. I also appreciate the motivation of 'seeing the algorithm'; makes sense. Why SQL-ish approach?

> Prompting is a wild west

I am racking my brain trying to remember a continuation based language that made it to hn frontpage recently. Wondering if something like that isn't a better approach for prompting.

But quite interesting. Thanks for the writeup.

Was it crystal by chance? I meant to have a look at it but never bothered, if it is maybe I will.

As for the SQLish approach, I’m not sure, it just seems to fit. I think it came from the way that there’s a thing you are requesting, and you have constraints you want applied to it. I think it’s one of those things where the analogy to SQL gives us developers just enough of a toe hold on what we’re doing to produce something but ultimately I think it will start to look less like SQL.

Great answer.

Though you say at some point

> That’s what I meant by the Tensorflow/keras analogy.

It seems you didn't mention keras before? Curious about what that analogy is about.

Earlier in the thread op said, “Langchain is to LMQL what Keras is to Tensorflow.”
very interesting abstraction. very DBT-esque. i will dig into the docs, thanks for sharing!
I keep hearing stuff like,"why use X framework or Y library and why not write it yourself?"

As AI moves from academia into mainstream dev these things help bridge gaps for those who don't understand the full pipeline.

Many people asking these questions have the burden of knowledge and can't remember what it is like for average devs to dive into this stuff.

Similar questions were asked about why we ever needed Entity Framework or Express for Node.

well to be fair, when you're scaling it does matter. i would want my techlead or seniors to care and know when/where to make specific trade-offs bc cloud costs are not forgiving.

i think that's where folks that make those comments are coming from.

Agree about Langchain. It's tedious to work with. I don't want so many abstractions.
Amen. Constructive feedback to Langchain dev(s):

- Reduce bloat, make packages optional e.g. pip install langchain[all] - Reduce opinionated implementation of vector stores, I want my own schema - Don't unnaturally force the chain abstraction - Invest more in document retrieval