| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by diggan 353 days ago

The huge gap between the people who claim "It helps me some/most of the time" and the other people who claim "I've tried everything and it's all bad" is really interesting to me.

Is it a problem of knowledge? Is it a problem of hype that makes people over-estimate their productivity? Is it a problem of UX, where it's hard to figure out how to use these tools correctly? Is it a problem of the user's skills, where low-skilled developers see lots of value but high-skilled developers see no value, or even negative value sometimes?

The experiences seem so different, that I'm having a hard time wrapping my mind around it. I find LLMs useful in some particular instances, but not all of them, and I don't see them as the second coming of Jesus. But then I keep seeing people saying they've tried all the tools, and all the approaches, and they understand prompting, yet they cannot get any value whatsoever from the tools.

This is maybe a bit out there, but would anyone (including parent) be up for sending me a screen recording of exactly what you're doing, if you're one of the people that get no value whatsoever from using LLMs? Or maybe even a video call sharing your screen?

I'm not working in the space, have no products or services to sell, only curious is why this vast gap seemingly exists, and my only motive would be to understand if I'm the one who is missing something, or there are more effective ways to help people understand how they can use LLMs and what they can use them for.

My email is on my profile if anyone is up for it. Invitation open for anyone struggling to get any useful responses from LLMs.

2 comments

troupo 353 days ago

> The experiences seem so different, that I'm having a hard time wrapping my mind around it.

Because we only see very disjointed descriptions, with no attempt to quantify what we're talking about.

For every description of how LLMs work or don't work we know only some, but not all of the following:

- Do we know which projects people work on? No

- Do we know which codebases (greenfield, mature, proprietary etc.) people work on? No

- Do we know the level of expertise the people have? Is the expertise in the same domain, codebase, language that they apply LLMs to?

- How much additional work did they have reviewing, fixing, deploying, finishing etc.?

Even if you have one person describing all of the above, you will not be able to compare their experience to anyone else's because you have no idea what others answer for any of those bullet points.

And that's before we get into how all these systems and agents are completely non-deterministic, and works now may not work even 1 minute from now for the exact same problem.

And that's before we ask the question of how a senior engineer's experience with a greenfield project in React with one agent and model can even be compared to a bon-coding designer in a closed-source proprietary codebase in OCaml with a different agent and model (or even the same, because of non-determinism).

link

skydhash 353 days ago

> And that's before we get into how all these systems and agents are completely non-deterministic,

And that is the main issue. For some the value is reproducible results, for others, as long as they got a good result, it's fine.

It's like coin tossing. You may want tail all the time, because that's your chosen bet. You may prefer tail, but don't mind losing money if it's head. You may not interested in either, but you're doing the tossing and wants to know the techniques that works best for getting tail. Or you're just trying and if it's tail, your reaction is only "That's interesting".

The coin itself does not matter and the tossing is just an action. The output is what get judged. And the judgment will vary based on the person doing it.

So software engineering used to be the pursuit of tail of the time (by putting the coin on the ground, not tossing it). Then LLMs users say it's fine to toss the coin, because you'll get tail eventually. And companies are now pursuing the best coin tossing techniques to get tail. And for some, when the coin tossing gives tail, they only say "that's a nice toss".

link

troupo 353 days ago

> And companies are now pursuing the best coin tossing techniques to get tail.

With the only difference that the techniques for throwing coins can be verified by comparing the results of the tosses. More generally it's known as forcing https://en.wikipedia.org/wiki/Forcing_(magic)

What we have instead is companies (and people) saying they have perfected the toss not just for a specific coin, but for any objects in general. When it's very hard to prove that it's true even for a single coin :)

That said, I really like your comment :)

link

skydhash 353 days ago

I think it's going to be personal. Because people define values in different ways, and the definition depends on the current context. I've used LLMs for things like shellscript, plotting with pyplot, explanations,... But always taking the output with a huge grain of salt. What I'm looking for is not the output itself, but the direction it can give me. But the only value is when I'm pressed for time and can't use a more objective and complete approach.

When you read the manual page for a program, or the documentation for a library, the things described always (99.99999...%) exist. So I can take it as objective truth. The description may be lacking, so I don't have a complete picture, but it's not pure fantasy. And if it turns out that it is, the solution is to drop it and turn back.

So when I act upon it, and the result comes back, I question my approach, not the information. And often I find the flaw quickly. It's slower initially, but the final result is something I have good confidence in.

link

diggan 353 days ago

> And often I find the flaw quickly. It's slower initially, but the final result is something I have good confidence in.

I guess what I'm looking for are people who don't have that experience, because you seem to be getting some value out of using LLMs at least, if I understand you correctly?

There are others out there who have tried the same approach, and countless of other approaches (self-declared at least) yet get 0 value from them, or negative value. These are the people I'm curious about :)

link