Hacker News new | ask | show | jobs
by xrd 622 days ago
I want hear more about this. I'm playing with langroid, crew.ai, and dspy and they all layer so many abstractions on top of a shifting LLM landscape. I can't believe anyone is really using them in the way their readme goals profess.
1 comments

Not you in particular, but I hear this common refrain that the "LLM landscape is shifting", but what exactly is shifting? Yes new models are constantly announced, but at the end of the day, interacting with the LLMs involves making calls to an API, and the OpenAI API (and perhaps Anthropic's variant) has become fairly established, and this API will obviously not change significantly any time soon.

Given that there is (a fairly standard) API to interact with LLMs, the next question is, what abstractions and primitives help easily build applications on top of these, while giving enough flexibility for complex use cases.

The features in Langroid have evolved in response to the requirements of various use-cases that arose while building applications for clients, or companies that have requested them.

Sonnet 3.5 and other large context models made context management approaches irrelevant and will continue to do so.

o1 (and likely sonnet 3.5) made chain of through and other complex prompt engineering irrelevant.

Realtime API (and others that will soon follow) will made the best VTT > LLM > TTV irrelevant.

VLMs will likely make LLMs irrelevant. Who knows what Google has planned for Gemini 2.

The point is building these complex agents has been proven a waste of time over and over again until, at least until we see a plateau in models. It's much easier to swap in a single API call and modify one or two prompts than to rework a convoluted agentic approach. Especially when it's very clear that the same prompts can't be reused reliably between different models.

I encourage you to run evals on result quality for real b2b tasks before making these claims. Almost all of your post is measurably wrong in ways that cause customers to churn an AI product same-day.
I appreciate your comment.

I suppose my comment is reserved more for the documentation than the actual models in the wild?

I do worry that LLM service providers won't do any better than rest API providers in versioning their backend. Even if we specify the model in the call to the API, it feels like it will silently be upgraded behind the scenes. There are so many parameters that could be adjusted to "improve" the experience for users even if the weights don't change.

I prefer to use open weight models when possible. But so many agentic frameworks, like this one (to be fair, I would not expect OpenAI to offer a framework that work local first), treat the local LLM experience as second class, at best.

Years ago we complained about the speed with which new JavaScript frameworks were popping into existence. Today it goes one order of magnitude faster, and the quality of the outputs can only be suffering. Yes there's code but so and so, interfaces and APIs change dramatically, and the documentation is a few versions behind. Who has time to compare simply cannot do it in depth, and ideas get also dropped on the way. I don't want to call it a mess because it's too negative, to have many ideas is great but I feel we're still in the brainstorming phase.