Hacker News new | ask | show | jobs
by jghn 751 days ago
This is looking at the wrong metric. I'm not expecting it to be 100% correct when I use it. I expect it to get me in the ballpark faster than I would have on my own. And then I can take it from there.

Sometimes that means I have a follow on question & iterate from there. That's fine too.

10 comments

> I expect it to get me in the ballpark faster than I would have on my own.

This is great if you are an experienced developer who can tell the difference between "in the ballpark" and fixable and "in the ballpark" but hopeless.

If 52% of responses have a flaw somewhere, then 48% of responses are flawless.

That is amazing and important.

The headline should be "LLM gives flawless responses to 48% of coding questions."

There are articles every day about how AI is replacing programmers, coding is dead etc, including from the nvidia ceo this week. This kind of thing shows we are not quite there yet. There are lots of folks on twitter etc who rave about how genAI built a full app for them, but in my experience that comes with a huge amount of human trial and error and understanding to know what needs to be tweaked.
> This kind of thing shows we are not quite there yet

I think you need to probably consider the time it took to go from 100% wrong, to 90, to 80, etc…. My guess is that interval is probably shrinking from milestone to milestone. This causes me to suspect that folks starting SWE careers in 2024 will not likely not be SWEs before 2030.

That’s why I tell my grandkids they should consider plumbing and HVAC trades instead of college. My bet is within 10 years nearly every vocation that you require a college degree will be made partially or completely obsolete by AI.

I tell my grandkids that vocational school is a perfectly decent and honorable way to get into a trade that pays better than retail.

I also tell them that a good university is a perfectly decent and honorable way to begin a life of the mind. It's not the only way, and a life of the mind isn't the life everyone wants.

I also tell them that the purpose of a university education is mostly not about training for a job.

something could be flawed but still arguably correct. the headline is fine. i wouldn't buy a calculator or read documentation that is 48% correct.
> This is great if you are an experienced developer who can tell the difference between "in the ballpark" and fixable and "in the ballpark" but hopeless.

While true, Stack Overflow wasn't much different. New devs would go there, grab a chunk of code and move on with their day. The canonical example being the PHP SQL injection advise shared there for more than a decade.

> While true, Stack Overflow wasn't much different.

StackOverflow has the advantage that, if an answer is wrong, other users can downvote it, and/or leave a comment explaining the mistake.

That’s how you, an experienced programmer, use it.

What does this do to beginners that are just learning to program? Is this helping them by forcing them to become critical reviewers or harming them by being a bad role model?

How is this any different than the age old "googling stack overflow" method everyone's been using for years?
Stack Overflow is a community with an answer-rating system and there is often some level of review from other people commenting on the answer's advantages and shortcomings. You often have multiple answers to choose from too. Those features build trust in an answer or prompt you to look elsewhere.

The UI for an LLM answer would have difficulty replicating the same thing since every answer is (probably) a new one and you have no input from other people about this answer too.

Edit: After writing my reply, I saw that roughly four other people (so far) made the exact same point and posted it a couple of minutes before. I think your question is a good one (made me think a little) and apologies it feels like you're being piled on.

I've never been that fond of SO, but I find chatgpt very useful. SO tends to be simpler one time questions. My interactions with gpt are conversations working towards a solution.

LLMs totally have a beginner problem. It's much like the problem where a beginner knows they need to look something up but can't figure out the right keywords to search for.

Also chatgpt has never called me an idiot for asking a stupid question, having not read the question properly, and making the assumption it was the same as an existing question after skimming it. I wouldn't ask SO a question these days, the response is more likely to be toxic than helpful.

LLMs have been trained on these answers, and can generate "it depends" too. Sometimes they're even too patronising and non-committal.

Chat interface has an advantage of having user-specific context and follow up questions, so it can filter and refine answers for the user.

With StackOverflow search it's up to the user to judge whether the answer they've found applies to their situation.

> Chat interface has an advantage of having user-specific context and follow up questions, so it can filter and refine answers for the user.

But it will continue to outright lie to you in those follow-ups.

https://chatgpt.com/share/767d2810-b38f-46e2-8cde-09248bb636...

Because on Stack Overflow you also see feedback from hopefully a cross section of the developer community, several methods of solving the problem, voting feedback on those solutions, and can sus out a proper solution vs. just having one answer fed back to you.
Stack Overflow generally upvotes answers that are correct. Have to be some lemons but I doubt it's at 52%.
Because people on stack overflow don't lie to the person who wrote the question very often. A correct answer to a problem that isn't the same as your problem is a better resource for learning than an incorrect answer to your exact problem.
The wrong answers on SO can be commented on and/or edited.
If you’re learning to program pretty much the best approach is to write and debug programs. There isn’t a shortcut

It’s like the saying “the fog of war” (at best you have incomplete and flawed information). Programming is just like that

Agreed; but isn’t this on the same continuum as programming assembly -> IDE autocomplete -> LLM autocomplete? You’re still writing code, but generally adding abstractions has been net good (unsure of this opinion tbf, but that’s my hunch)
> If you’re learning to program pretty much the best approach is to write and debug programs.

I'd argue the best way to learn is to read a lot of production-quality code to get a sense of structure and best practices in any given language.

I’ve listened to 10,000 hours of piano music and I still can’t play anything

Debugging is the primary skill of a programmer. 90% of programming is fixing bugs, the other 10% is writing bugs.

Maybe the LLM is teaching debugging by giving bad examples :)

> I’ve listened to 10,000 hours of piano music and I still can’t play anything

I don't think this is a valid comparison. If you'd read 10,000 hours of sheet music I'd wager you'd know how to read music.

When I studied in Ulaan Bataar I met a professor of linguistics from eastern Europe. Before he came to Mongolia he studied a grammar book of mongolian and tried to teach himself. He was rather proud of how far he had come.

At the first lesson he realised that the characters he thought he knew how to pronounce didn't sound much like he was used to. Mongolian is generally written with cyrillic plus a few more characters, so he expected it to be like russian or bulgarian with a few more sounds.

This is not the case. Mongolian is much closer related to korean and tibetan, and commonly sounds something like drunk cats haggling over something deceased.

I find it to be roughly the same with introductory or otherwise shallow learning material about programming. You can read as many tutorials as you want, you'll still suck at it.

When the LLM:s invent books like SICP, The Art of Computer Programming, Purely Functional Data Structures, Gang of Four, then they might become tutors in this area. To me it seems they struggle hard with anything longer than a screenful.

> What does this do to beginners that are just learning to program? Is this helping them by forcing them to become critical reviewers or harming them by being a bad role model?

Harming them.

I told a new grad employee to write some unit tests for his code, explained the high level concepts and what I was looking for, and pointed him at some resources. He spun his wheels for weeks, and it turned out he was trying to get ChatGPT to teach him how to do it, but it would always give him wrong answers.

I eventually had to tell him, point blank, to stop using ChatGPT, read the articles, and ask me (or a teammate) if he needed help.

Beginners tend to write awful code without GPT's help, so I don't think it makes things worse.

Answers don't exist in a vacuum. The chat interface allows feedback and corrections. Users can paste an error they're getting, or even say "it doesn't work", and GPT may correct itself or suggest an alternative.

> Beginners tend to write awful code without GPT's help, so I don't think it makes things worse.

> Answers don't exist in a vacuum. The chat interface allows feedback and corrections. Users can paste an error they're getting, or even say "it doesn't work", and GPT may correct itself or suggest an alternative.

I think you're making the mistake of viewing the job as a black box that produces output.

But what you're proposing is a terrible way to develop someone's skills and judgement. They won't develop if they're getting their hand held all the time (by an LLM or a person), and they'll stagnate. The problem with an LLM, unlike a person, it that it will hold your hand forever without complaint, while giving unreliable advice.

That's speculation about a hypothetical person, one that falls into learned helplessness, but there are people with different mindsets.

Getting some results with the help of infinitely-patient GPT may motivate people to learn more, as opposed to losing motivation from getting stuck, having trouble finding right answers without knowing the right terminology, and/or being told off by StackOverflow people that's a homework question.

People who want to grow, can also use GPT to ask for more explanations, and use it as a tutor. It's much better at recalling general advice.

And not everyone may want to grow into a professional developer. GPT is useful to lots of people who are not programmers, and just need to solve programming-adjacent problems, e.g. write a macro to automate a repetitive task, or customize a website.

> Getting some results with the help of infinitely-patient GPT may motivate people to learn more, as opposed to losing motivation from getting stuck, having trouble finding right answers without knowing the right terminology,

> ...People who want to grow, can also use GPT to ask for more explanations, and use it as a tutor. It's much better at recalling general advice.

The psychology there doesn't make sense, since the technology simultaneously takes away a big motivation to actually learn how to get the result on your own. It's like giving a kid a calculator and expecting him to use it to learn mental arithmetic. Instead, you actually just removed the motivation for most kids to do so.

I think there's a common, unstated assumption in tech circles that removing "friction" and making things "easier" is always good. It's false.

Also, a lot of what you said feels like a post-hoc rationalization for applying this particular technology as a solution to a particular problem, which is a big problem with discourse around "AI" (just like it was with blockchain). That stuff is just in the air.

> ...and/or being told off by StackOverflow people that's a homework question.

IMHO, that's the one legitimately demotivating thing on your list.

In my experience it never actually fixes the problem. It either gives you a random change back, or gives you the same solution.
Same could be said of wrong stack overflow answers or random google results. Clearly they’ll become critical of the results if the code simply doesn’t compile, same as our generation sharpened our skills by filtering bad from good from google results
If this increases iteration speed for beginner devs and they learn about code quality post it goes into the real world, it’s not a bad bargain to strike imo.

I think we all partly learnt about code quality by having our code break things in the real world.

I've been saying from the start that this is not a tool for beginners and learners. My students use it constantly and I keep telling them when they go to chat GPT for answers, it's like they are going to a senior for help -- they know a lot but they are often wrong in subtle and important ways.

That's why classes are taught by professors and not undergrads. Professors are at least supposed to know what they don't know.

When students think of ChatGPT as their drunk frat bro they see doing keg stands at the Friday basement party rather than as an expert they use it differently.

The same is true of things like code generators and scaffolds.
I think the key difference is these are not promising a solution, just a foundation.
From the article:

> What's especially troubling is that many human programmers seem to prefer the ChatGPT answers. The Purdue researchers polled 12 programmers — admittedly a small sample size — and found they preferred ChatGPT at a rate of 35 percent and didn't catch AI-generated mistakes at 39 percent.

you're giving your fellow programmers way too much credit. most people will never bother doing that.
Absolutely, it's especially useful when it suggests which libraries to use if you're not familiar with the ecosystem. Or writing boilerplate for popular frameworks, step by step. It can, to a degree, repair errors if you paste it the output.
Might not be a good idea for people not security aware: https://vulcan.io/blog/ai-hallucinations-package-risk/#h2_4
Every time I’ve tried it it’s sent me to completely the wrong ball park and after a while whacking it’s solution I end up completely dumping it and doing it myself.
Exactly! Just because part of the answer isn't right, doesn't mean the entire answer is useless. It's much faster than only doing a Google search when working out the solution to a problem.
Sometimes you are going to loose a lot of time trying to make a ChatGPT solution work when Google would have provided right away the right answer... Just yesterday I asked ChatGPT for an AWS IAM policy. ChatGPT-4o provided an answer that looked ok but was just wrong, tried to make it work without success. Just Googled it and the first result provided me the right answer.
I prefer Phind for this type of question since you can see search results that it's likely drawing answers from.

But ChatGPT is often a huge time saver if you know exactly what you want to do and just let it fill in the how. I have these 3 jsonl files and I want to use jq to do blah blah and then convert them to csv

This is how we've all adapted to use these tools, but it's not what was originally pitched.
This. For inexperienced developers, I advise thus; don't consume answers you don't understand. If you can't read it, interrogate it, and find a question at your own level. When you accept its emission, you're taking responsibility for it, and beyond a certain low level, it can't do your thinking for you.
I agree this is the correct way to use it, and it is incredibly useful in that case, but I think a study like this is valuable in the face of all the hype/fud about how AI Agents can program entire complex applications with just a few prompts and/or will replace software engineers shortly.