Hacker News new | ask | show | jobs
by Zafira 104 days ago
> At the moment it is a mysterious, occasionally fickle, tool - but if you provide the correct feedback mechanisms and provide small tweaks and context at idiosyncrasies, it's possible to get agents to reliably build very complex.

This sounds like arguing you can use these models to beat a game of whack-a-mole if you just know all the unknown unknowns and prompt it correctly about them.

This is an assertion that is impossible to prove or disprove.

3 comments

No it's more like if you knew how to build it before - LLM agents help you build it faster. There's really no useful analogy I can think of, but it fits my current role perfectly because my work is constantly interrupted by prod support, coordination, planning, context switching between issues etc.

I rarely have blocks of "flow time" to do focused work. With LLMs I can keep progressing in parallel and then when I get to the block of time where I can actually dive deep it's review and guidance again - focus on high impact stuff instead of the noise.

I don't think I'm any faster with this than my theoretical speed (LLMs spend a lot of time rebuilding context between steps, I have a feeling current level of agents is terrible at maintaining context for larger tasks, and also I'm guessing the model context length is white a lie - they might support working with 100k tokens but agents keep reloading stuff to context because old stuff is ignored).

In practice I can get more done because I can get into the flow and back onto the task a lot faster. Will see how this pans out long term, but in current role I don't think there are alternatives, my performance would be shit otherwise.

You could probably replace LLM with "junior engineer" here as it sounds like you're basically a manager now. The big negative that LLMs have in comparison with junior engineers is that they can't learn and internalise new information based on feedback.
"The big negative that LLMs have in comparison with junior engineers is that they can't learn and internalise new information based on feedback."

No, but they can take "notes" and can load those notes into context. That does work, but is of course not so easy as it is with humans.

It is all about cleaning up and maintaining a tidy context.

I don't like that analogy. If I had to work with a Claude like junior I would ask for them to get removed from my team - inability to learn stuff, completely unexpected/unrelatable faliure modes and performance.

On the other hand Claudes tenacity, stamina and sustained speed is superhuman. The more capable models become the more valuable this is.

The same is true with human engineers - isn't this just what engineering is?
>This is an assertion that is impossible to prove or disprove.

This is a joke right? There are complex systems that exist today that are built exclusively via AI. Is that not obvious?

The existence of such complex systems IS proof. I don't understand how people walk around claiming there's no proof? Really?

The assertion was "if you really know how to prompt, give feedback, do small corrections and fix LLM errors, then everything works fine".

It is impossible to prove or disprove because if everything DOES NOT work fine you can always say that the prompts were bad, the agent was not configured correctly, the model was old, etc. And if it DOES work, then all of the previous was done correctly, but without any decent definition of what correct means.

>And if it DOES work, then all of the previous was done correctly, but without any decent definition of what correct means.

If a program works, it means it's correct. If we know it's correct, it means we have a definition of what correct means otherwise how can we classify anything as "correct" or "incorrect". Then we can look at the prompts and see what was done in those prompts and those would be a "correct" way of prompting the LLM.

You don’t know it works. That you so glibly speak about products working is proof that your engineering judgment is impaired. You can’t infer the exact contents of a black box merely by looking at outside behavior.

The fundamental fallacy you are exhibiting here is similar to saying that rolling a six sided die and getting a “6” means that you will always get a 6 any time you roll it. And that if you get a 6 and wanted a 6, you must have therefore rolled those dice “correctly” and had you not gotten a 6 that would have meant you rolled them “wrong.”

You know that is not true.

>You don’t know it works. That you so glibly speak about products working is proof that your engineering judgment is impaired. You can’t infer the exact contents of a black box merely by looking at outside behavior.

I don't know the exact internals of a car. But I can infer my car works by driving it.

>The fundamental fallacy you are exhibiting here is similar to saying that rolling a six sided die and getting a “6” means that you will always get a 6 any time you roll it. And that if you get a 6 and wanted a 6, you must have therefore rolled those dice “correctly” and had you not gotten a 6 that would have meant you rolled them “wrong.”

Bro we rolled that dice MULTIPLE times. It's not a one time thing. And the "rolling" of the die is done with a CHAIN of MULTIPLE qureries strung together. This is not one roll. It's multitudes of data points. Yes results can be inconsistent from a technical standpoint, but the general result converges on a singular trend.

We know that much is true: a statistic and that is at most all we can say about reality as we know it as science formalized can only give a statistic as an answer.

"I don't know the exact internals of a car. But I can infer my car works by driving it."

No, you can't infer that it "works." Only that it CAN work. The car may be poisoning you with carbon monoxide. Your rear brakes may have become disconnected (happened to me). The antilock braking system may have a faulty sensor that only fails at very low speed, leading to them engaging when making a normal stop, but also preventing the mechanic from seeing the problem, because he didn't listen to your bug report and instead tried to repro the effect with high speed panic stops (also happened to me).

If I use a product and have a good experience, I can conclude that SOMETHING must be going well, but not that EVERYTHING is going well.

This is reasoning about evidence 101.