Hacker News new | ask | show | jobs
by infecto 340 days ago
Link does not work for me but as someone who does a lot of work with LLMs I am also betting against agents.

Agents have captivated the minds of groups of people in each large engineering org. I have no idea what their goal is other then they work on “GenAI”. For over a year now they have been working on agents with the promise that the next framework that MSFT or Alphabet publishes will solve their woes. They don’t actually know what they are solving for except everything involves agents.

I have yet to see agents solve anything but for some reason this idea that having an agent that you can send anything and everything will solve all problems for the company. LLMs have a ton of interesting applications but agents have yet to grasp me as interesting, I also don’t understand why so many large companies have focused time around it. They are not going to be cracking the code ahead of a commercial tool or open source project. In the time spent toying around with agents there are a lot of interesting applications that could have built, some of which may be technically an agent but without so much focus and effort on trying to solve for all use cases.

Edit: after rereading my post wanted to clarify that I do think there is a place for tool call chains and the like but so many folks I have talked to first hand are trying to create something that works for everything and anything.

5 comments

I think in general if everyone is talking about a solution and nobody is talking about problems then it's a sign we're in a bubble.

For me the only problem I have is I find typing slow and laborious. I've always said if I could find a way to type less I would take it. That's why I've been using tab completion and refactoring tools etc for years now. So I'm kind of excited about being able to get my thoughts into the computer more quickly.

But having it think for me? That's not a problem I have. Reading and assimilating information? Again, not a problem I have. Too much of this is about trying to apply a solution where there is no problem.

Maybe you are in a job where it’s not a good use case but there are fields that are handling massive amounts of data or have a huge amount of time waiting for processing data before moving to the next step that I think handing it off to an AI agent to solve then a human puts the pieces together based on its own logic and experiences would work quite nice.
The HN fallacy that is a large % of the posts on AI

"AI is not good for what I do, therefore AI is useless"

not quite sure what you are proposing here. what exactly is AI agent solving in this example?

I keep hearing vague stuff exactly like your comment at work from management. Its so infuriating.

For instance cyber security toolsets like mde capture a lot of data. That data is made meaningless unless someone is looking through it, at my org there isn’t enough manpower to do that, so one solution is using an agent to help characterize that network log data into suspicious or what’s worthy of a human to follow up on.
<< I also don’t understand why so many large companies have focused time around it. They are not going to be cracking the code ahead of a commercial tool or open source project.

I think it is a mix of fomo and the 'upside' potential of being able to minimize ( ideally remove ) the expensive "human component". Note, I am merely trying to portray a specific world model.

<< In the time spent toying around with agents there are a lot of interesting applications that could have built, some of which may be technically an agent but without so much focus and effort on trying to solve for all use cases.

Preaching to the choir man. We just got custom AI tool ( which manages to have all my industry specific restrictions rendering it kinda pointless, low context making it annoying, and slower than normal, because it now has to go through several layers of approval including 'bias' ).

At the same time, committee bickers over minute change to a process that has effectively no impact on anything of value.

Bonkers.

>I think it is a mix of fomo and the 'upside' potential of being able to minimize ( ideally remove ) the expensive "human component". Note, I am merely trying to portray a specific world model.

IOW, it's a case of C-suite "monkey see, monkey do" kicked off by management consultants with crap to sell for very high prices...

I have no idea what agents are for, could be my own ignorance.

That said, I have been using LLMs for a while now with great benefit. I did not notice anything missing, and I am not sure what agents bring to the table. Do you know?

You are a manual agent to LLMs when you use things like ChatGPT. You go through a workflow loop when you try to investigate and consult with an LLM. Agents are just trying to automate your workflow against an LLM. It's basically just scripting. Scripting these LLMs is where we all want to go, but the context window length is a limiting factor, as well as inferencing on any notable sized window.

I'll manage my whiney emotions over the term Agents, but you'll have to hold a gun to my head before I embrace "Agentic", which is a thoroughly stupid word. "Scripted workflow" is what it is, but I know there are some true "visionaries" out there ready to call it "Sentient workflow".

Exactly, thank you.

What I am doing is definitely manual, it is the old-fashioned prompt-copy-paste-test-repeat cycle, but it has been educational.

I will join you in the fight against "agentic". Ridiculous.
An agent is an LLM + a tool call loop - it is quite a step up in terms of value in my experience
Agents are more than that.

Agents, besides tool use, also have memory, can plan work towards a goal, and can, through an iterative process (Reflect - Act), validate if they are on the right track.

If an agent takes a Topic A and goes down a rabbit hole all the way to Topic Z, you'll see that it won't be able to incorporate or backtrack back to Topic A without losing a lot of detail from the trek down to Topic Z. It's a serious limitation right now from the application development side of things, but I'm just reiterating what the article pointed out, which is that you need to work with fewer step workflows that isn't as ambitious as covering all things from A-Z.
Yes, that's commonly referred to as the Exploration-Exploitation Dilemma. Should the agent go deep or wide?

https://en.wikipedia.org/wiki/Exploration%E2%80%93exploitati...

Not a disagreement with you but wanted to further clarify.

I do think it’s a step up when done correctly. Thinking of tools like Cursor. Most of my concern and issue comes from the amount of folks I have seen trying to great a system that solves everything. I know in my org people were working on Agents without even a problem they were solving for. They are effectively trying to recreate ChatGPT which to me is a fools errand.

I’d boil it down thusly:

What do agents provide? Asynchronous work output, decoupled from human time.

That’s super valuable in a lot of use cases! Especially because it’s a prerequisite for parallelizing “AI” use (1 human : many AI).

But the key insight from TFA (which I 100% agree with) is that the tyranny of sub-100% reliability compounded across multiple independent steps is brutal.

Practical agent folks should be engineering risk / reliability, instead of happy path.

And there are patterns and approaches to do that (bounded inputs, pre-classification into workable / not-workable, human in the loop), but many teams aren’t looking at the right problem (risk/reliability) and therefore aren’t architecting to those methods.

And there’s fundamentally no way to compose 2 sequential 99% reliable steps into a 99% reliable system with a risk-naive approach.

What is the use case? What does it solve exactly, or what practical value does it give you? I am not sure what a tool call loop is.
An example:

I updated a svelte component at work, and while i could test it in the browser and see it worked fine, the existing unit test suddenly started failing. I spent about an hour trying to figure out why the results logged in the test didn't match the results in the browser.

I got frustrated, gave in and asked Claude Code, an AI agent. The tool call loop is something like: it reads my code, then looks up the documentation, then proposed a change to the test which i approve, then it re-runs the test, feeds the output back into the AI, re-checks the documentation, and then proposes another change.

It's all quite impressive, or it would be if at one point it didn't randomly say "we fixed it! The first element is now active" -- except it wasn't, Claude thought the first element was element [1], when of course the first element in an array is [0]. The test hadn't even actually passed.

An hour and a few thousand Claude tokens my company paid for and got nothing back for lol.

any examples outside of coding agents ?

Even in this example coding agent is short lived . I am curious about continuously running agents that are never done.

A friend of mine set up a cron job coupled with the Claude API to process his email inbox every 30 minutes and unsubscribe/archive/delete as necessary. It could also be expanded to draft replies (I forget if his does this) and even send them, if you’re feeling lucky. I’m pretty sure the AI (I’m guessing Claude Code in this case) wrote most or all of the code for the script that does the interaction with the email API.

An example of my own, not agentic or running in a loop, but might be an interesting example of a use case for this stuff: I had a CSV file of old coupon codes I needed to process. Everything would start in limbo, uncategorized. Then I wanted to be able to search for some common substrings and delete them, search for other common substrings and keep them. I described what I wanted to do with Claude 3.7 and it built out a ruby script that gave me an interactive menu of commands like search to select/show all/delete selected/keep selected. It was an awesome little throwaway script that would’ve taken me embarrassingly long to write, or I could’ve done it all by hand in Excel or at the command line with grep and stuff, but I think it would’ve taken longer.

Honestly one of the hard things about using AI for me is remembering to try to use it, or coming up with interesting things to try. Building up that new pattern recognition.

No, the fact Claude couldn't remember that JavaScript is zero-indexed for more than 20 minutes has not left me interested in letting it take on bigger tasks
The tools can be an editor/terminal/dev environment, automatically iterating to testing the changes and refining until a finished product, without a human developer, at least that is what some wish of it.
Oh, okay, I understand it now, especially with the other comment that said Cursor is one. OK, makes sense. Seems like it "just" reduces friction (quite a lot).
Yeah, it's really just a user experience improvement. In particular, it makes AI look a lot better if it can internally retry a bunch of times until it comes up with valid code or whatever, instead of you having to see each error and prompt it to fix it. (Also, sometimes they can do fancy sampling tricks to force the AI to produce a syntactically valid result the first time. Mostly this is just used for simple JSON schemas though.)
> I am not sure what a tool call loop is.

See https://ampcode.com/how-to-build-an-agent

that was a great read, thanks! - agentic noob
Cursor is my classic example. I don’t know exactly what tools are defined in their loop but you give the agent some code to write. It may search your code base, it may then search online for third party library docs. Then come back and write some code etc.
If it were only tool use, then it would be the same as a lambda function.
Link is working for me — perhaps it was not 30 minutes ago? (Safari, MacOS)
[flagged]
That’s a bit reductive and misses the core issue. Of course companies want to reduce headcount or boost productivity, but many are pursuing these initiatives without a clear problem in mind. If the mandate were, say, “we’re building X to reduce customer support staff by 20%,” that would be a different story. Instead, it often feels like solution-first thinking without a clear target.

Edit: not even going to reply to comments below as they continue down a singular path of oh you ought to know what they are trying to do. The only point I was making is orgs are going solution-first without a real problem they are trying to solve and I don’t think that is the right approach.

> “we’re building X to reduce customer support staff by 20%,”

I've never understood the "do X to increase/decrease Y by Z%". I remember working at McDonalds and the managers worked themselves up into a frenzy to increase "sale of McSlurry by 10%". All it meant was that they nagged people more and sold less of something else. It's not like people's stomachs got 10% larger.

The sad part is that companies doing this will very soon figure out that the 20% less staff they "achieved" is only at a cost of 100% increase in development and fees to LLM vendor. Moreover, after a few years these fees will skyrocket because their businesses are now dependent on this technology and unlike people, LLMs are monopolized by just a few robber barons.
That is not a goal that can be shared without alienating the current workforce. So you can bet that goal was clearly stated at CXO level, and is being communicated/translated piece wise as let’s find out how much more productive we can get with AI. You’re going to find out about the goal once you reach it.

That is not to say you should work against your company, but bear in mind this is a goal and you should consider where you can add value outside of general code factory productivity and how for example you can become a force multiplier for the company.

I agree, and would like to hear examples of where this has not been the case. I'm sure they're out there. But pretty much everything has been "how can we use LLMs" and "it doesn't matter if it was a problem that we had that needed to be solved; we need to gain experience now because AI is The Future and we can't be left behind".

Occasionally it works and people stumble across a problem worth solving as they go about applying their solution to everything. But that's not planning or top-down direction. That's not identifying a target in advance.

yes my organization head at my employer has asked us to submit: "Generative AI Agent" proposals for upcoming planning session. Apparently those ideas will get the big seat at the planning table. I've been trying to think of many ideas but they all end up being some sort of workflow automation that was possible without agent stuff.

Agreed with your annoyance at "they are replacing you" comments. like duh. Thats what they've been doing forever.