Hacker News new | ask | show | jobs
by whynotminot 121 days ago
It’s also pretty wild to me how people still don’t really even know how to use it.

On hacker news, a very tech literate place, I see people thinking modern AI models can’t generate working code.

The other day in real life I was talking to a friend of mine about ChatGPT. They didn’t know you needed to turn on “thinking” to get higher quality results. This is a technical person who has worked at Amazon.

You can’t expect revolutionary impact while people are still learning how to even use the thing. We’re so early.

10 comments

I don't think "results don't match promises" is the same as "not knowing how to use it". I've been using Claude and OpenAI's latest models for the past two weeks now (probably moving at about 1000 lines of code a day, which is what I can comfortably review), and it makes subtle hard-to-find mistakes all over the place. Or it just misunderstands well known design patterns, or does something bone headed. I'm fine with this! But that's because I'm asking it to write code that I could write myself, and I'm actually reading it. This whole "it can build a whole company for me and I don't even look at it!" is overhype.
Prompting LLMs for code simply takes more than a couple of weeks to learn.

It takes time to get an intuition for the kinds of problems they've seen in pre-training, what environments it faced in RL, and what kind of bizarre biases and blindspots it has. Learning to google was hard, learning to use other peoples libraries was hard, and its on par with those skills at least.

If there is a well known design pattern you know, thats a great thing to shout out. Knowing what to add to the context takes time and taste. If you are asking for pieces so large that you can't trust them, ask for smaller pieces and their composition. Its a force multiplier, and your taste for abstractions as a programmer is one of the factors.

In early usenet/forum days, the XY problem described users asking for implementation details of their X solution to Y problem, rather than asking how to solve Y. In llm prompting, people fall into the opposite. They have an X implementation they want to see, and rather than ask for it, they describe the Y problem and expect the LLM to arrive at the same X solution. Just ask for the implementation you want.

Asking bots to ask bots seems to be another skill as well.

Let me clarify, I've been using the latest models for the last two weeks, but I've been using AI for about a year now. I know how to prompt. I don't know why people think it's an amazing skill, it's not much different from writing a good ticket.
Writing a good ticket is not a common skill. IMO it seems deceptively easy but usually requires years of experience to understand what to include and express it in the most concise yet unambiguous terms possible for the intended audience.
Do you use an agent harness to have it review code for you before you do?

If not, you don't know how to use it efficiently.

A large part of using AI efficiently is to significantly lower that review burden by having it do far more of the verification and cleanup itself before you even look at it.

This is correct, but part of the issue is that it significantly increases token usage costs. Some companies are doing:

- PRD and spec fulfillment review

- code review + correction loops

- security review + corrections

- addl. test coverage and tidying

- addl. type checks and tidying

- addl. lint checks and tidying

- maybe more I haven't listed

And these are run after each commit, so you can only imagine the costs per engineer doing this 10, 20, 50+ times per day depending on how much work they're knocking out.

Sure, it adds tokens. I've burnt 200 million tokens today on a single project.

The question is what your time is worth for the company, and which tasks costs less to have an agent automate than having you do.

I think if you were to scale that kind of usage across a reasonable team size, costs would start to add up fast — and possibly beyond the cost of paying another engineer every year, especially if a lot of your teammates are new to AI, or aren't using it efficiently. Of course, it all depends on the appetite of the company.

The other constraint is, for those who are being laid off (maybe because of cost reduction to support an AI budget for a smaller team to use), engineers wanting to expand their skill set and practice these levels of usage + efficiency are effectively unable to with their own funding, making it more difficult to find employment as expectations heighten.

Prior to AI entering the fray, software development was largely free for everyone, allowing anyone with enough time and motivation to build the skills towards gainful employment. As AI becomes more prevalent and expectations around how it's used become higher, fewer and fewer applicants will be able to claim they have the experience necessary because it was out of reach due to costs.

> Do you use an agent harness to have it review code for you before you do?

Right now you need to be Uncle Moneybags to do this in your personal life.

If you're lucky, your employer is footing the bill but otherwise... Ugh. It's like converting your app running perfectly fine on a cheap VPS to AWS Lambda. In theory, it's fine but in reality the next bill you get could make you faint.

It's down to how much you value your time. If your value your time low enough, it doesn't pay to make AI take over. If you value it high enough, it does.
I have it run tests and every few days I ask it to do a code quality analysis check on the codebase.

I'm unconvinced AI reviewing AI is the answer here, because all LLMs have the same flaws. To me, the harness/guard rails for AI should be different technologies that work differently and in a more formal sense. IE, static code analysis, linters, tests, etc.

(Linting has actually been, by far, the BEST code quality enforcers for the agents I've run so far, and it's a lot cheaper and more configurable than running more agents.)

What sort of agent harness setup do you recommend?
If you know good architecture and you are testing as you go, I would say, it is probably pretty damn close to being able to build a company without looking at the code. Not without "risk" but definitely doable and plausible.

My current project that I started this weekend is a rust client server game with the client compiled into web assembly.

I do these projects without reading the code at all as a way to gauge what I can possibly do with AI without reading code, purely operating as a PM with technical intuition and architectural opinions.

So far Opus 4.6 has been capable of building it all out. I have to catch issues and I have asked it for refactoring analysis to see if it could optimize the file structure/components, but I haven't read the code at all.

At work I certainly read all the code. But would recommend people try to build something non trivial without looking at the code. It does take skill though, so maybe start small and build up the intuition on how they have issues, etc. I think you'll be surprised how much your technical intuition can scale even when you are not looking at the code.

Security auditor and criminals have a bright future ahead of them.
That is why I said "risk". Though the models are pretty good "if" you ask for security audits. Notice I didn't say you could do it without technical knowledge right now, so you need to know to ask for security review.

I have friends in security on major platforms who are impressed by the security review of the SOT models. Certainly better than the average bootstrapped founder.

For a few years maybe, but I see little reason to think this stuff won't be coming for their jobs as well.
True, but you'd be surprised how much you can tighten up a codebase by asking a heftier model to do a security review and suggest fixes.
At what point do people really know if it has been tightened up if they never look at the code?
That's the catch -- a team would need to care enough about quality, or don't at their own peril.
How does a PM know that the code has been tighten up by the offshore team?
Why is there a point of not reading the code? Even with very competent humans we have put in place systems for reviewing the code.
What’s the game? Genuinely curious!
Simultaneous turn based top down car combat where you design the cars first. Inspired by Car Wars, but taking advantage of computers, so spline based path planing and much more complicated way of calculating armor penetration and damage.

I'm building to play with my friends online.

> and it makes subtle hard-to-find mistakes all over the place.

I agree. I'm constantly correcting the code it generates. But then, I do the same for humans when I review their PRs, and the LLM generated the code in a 100th of the time (or whatever figure you prefer).

And yet, this is exactly what my last job's engineering & product leadership did with their CEO at the helm, before they laid me off.

They vibe-coded a complete rewrite of their products in a few months without any human review. Hundreds of thousands LOC. I feel sorry for the remaining engineers having to learn everything they just generated, and are now having customers use.

You are assuming that we all work on the same tasks and should have exactly the same experience with it, which is it course far from the truth. It's probably best to start with that base assumption and work on the implications from there.

As for the last example, for all the money being spent on this area, if someone is expected to perform a workflow based on the kind of question they're supposed to ask, that's a failure in the packaging and discoverability aspect of the product, the leaky abstraction only helps some of us who know why it's there.

I’ve been helping normal people at work use AI and there’s two groups that are really struggling:

1. People who only think of using AI in very specific scenarios. They don’t know when you use it outside of the obvious “to write code” situations and they don’t really use AI effectively and get deflated when AI outputs the occasional garbage. They think “isn’t AI supposed to be good at writing code?”

2. People who let AI do all the thinking. Sometimes they’ll use AI to do everything and you have to tell them to throw it all away because it makes no sense. These people also tend to dump analyses straight from AI into Slack because they lack the tools to verify if a given analysis is correct.

To be honest, I help them by teaching them fairly rigid workflows like “you can use AI if you are in this specific situation.” I think most people will only pick up tools effectively if there is a clear template. It’s basically on-the-job training.

> On hacker news, a very tech literate place, I see people thinking modern AI models can’t generate working code.

I am completely flooded with comments and stories about how great LLMs are at coding. I am curious to see how you get a different picture than this? Can you point me to a thread or a story that supports your view? At the moment, individuals thinking AI cannot generate working code seem almost inexistent to me.

It's a real thing, but usually tied to IT folks that tried ChatGPT ~2 years ago (in a web browser) and had to "fix" whatever it output. That situation solidified their "understanding of AI" and they haven't updated their knowledge on the current situation (because... No pressing need).

Folks like this have never used AI inside of an IDE or one of the CLI AI tools. Without that perspective, AI seems mostly like a gimmick.

> On hacker news, a very tech literate place

I think this is the prior you should investigate. That may be what HN used to be. But it's been a long time since it has been an active reality. You can still see actual expert opinions on HN, but they are the minority more and more.

I think one longtime HN user (Karrot_Kream I think) pinpointed the change in HN discourse to sometime in mid 2022 to early 2023 when the rate of new users spiked to 40k per month and remained at that elevated rate.

From personal experience, I've also noticed that some of the most toxic discourse and responses I've received on this platform are overwhelmingly from post-2022 users.

HN got a write-up in a highly political, non-technical magazine around that time.
It's still September.
>You can’t expect revolutionary impact while people are still learning how to even use the thing. We’re so early.

What makes you think people this will ever change? Have you seen how well people know their already existing tools?

In a WhatsApp group full of doctors, managers, journalist and engineers (including software) in age of 30-60 I asked if anyone heard of openclaw and only 3 people heard of it from influencers, none used it.

But from my social feed the impression was that it is taking over the world:)

I asked it because I am building something similar since some tome and I thought its over they were faster than me but as it appears there’s no real adoption yet. Maybe there will be some once they release it as part of ChatGPT but even then it looks like too early as actually few people are using the more advanced tools.

It’s definitely in very early stage. It appears that so far the mainstream success in AI is limited to slop generation and even that is actually small number of people generating huge amounts of slop.

> I asked if anyone heard of twitter vaporware and only 3 people heard of it from influencers, none used it.

Shocking results, I say!

No, these people ("managers, engineers" etc.) do just not work in tech & IT but in other fields and they do not read tech news in your country etc.

Most people are just "not that deep in there" as most people on HN.

I spend between 1 and 2h a day on hn and I barely know what openclaw is. I've seen it mentioned once or twice and checked their website but that's all.

If one lets AI FOMO since the release of chatgpt drive them they'd be glued to their screen 24/7.

What is happening is nonsensical.

OAI wants to keep the hype train going. That is all. OpenClaw is just a project that attracted the interests of people messing about with LLMs. Which as a proportion of economically active people is.... tiny.

They brought him (Pete) over as he seems to have some way of thinking about LLMs in the form of a product. Will he have repeatable success on a large scale? Who knows. I doubt it personally.

> “Tech news”

A guy attached Claude to his socials, groundbreaking tech.

Once I was working for a consulting & development company; they were trying to enter sector ABC by stuffing up a team of people, so I was told, who had interest in sector ABC stuff and want to do some projects there.

While they were deep in software development in general, no body of them read any of the essential/required daily industrial news (also not that one related to doing software development in sector ABC)

:-)

So no, even people somehow attached to a topic are not necessarily somehow deeper involved.

> I asked it because I am building something similar since some tome and I thought its over they were faster than me

If you have been working on a usecase similar to OpenClaw for sometime now I'd actually say you are in a great position to start raising now.

Being first to market is not a significant moat in most cases. Few people want to invest in the first company in a category - it's too risky. If there are a couple of other early players then the risk profile has been reduced.

That said, you NEED to concentrate on GTM - technology is commodified, distribution is not.

> It appears that so far the mainstream success in AI is limited to slop generation and even that is actually small number of people generating huge amounts of slop

The growth of AI slop has been exponential, but the application of agents for domain specific usecases has been decently successful.

The biggest reason you don't hear about it on HN is because domain-specific applications are not well known on HN, and most enterprises are not publicizing the fact that they are using these tools internally.

Furthermore, almost anyone who is shipping something with actual enterprise usage is under fairly onerous NDAs right now and every company has someone monitoring HN like a hawk.

Do you think that it is a good idea to release it first on iOS, announce on HN and Producthunt? How would you do?

On my app the tech is based on running agent generated code on JavaScriptCore to do things like OpenClaw, I’m wrapping the JS engine with the missing functionality like networking, file access and database access so I believe I will not have a problem with releasing it on Apple AppStore as I use their native stack. Then since this stack is also OS, I’m making a version that will run on Linux, the idea being users develops their solution on their device(iOS&Mac currently) see it working and and then deploys on a server with a tap of a button, so it keeps running.

Who's your persona? How are you pricing and packaging? Who is your buyer? Are you D2C? Consumer? Replacing EAs? Replacing Project Managers? ...

You need to answer these questions in order to decide whether a Show HN makes sense versus a much more targeted launch.

If you do not know how to answer these questions you need to find a cofounder asap. Technology is commodified. GTM, sales, and packaging is what turns technology into products. Building and selling and fundraising as 1 person is a one-way ticket to burnout, which only makes you and your product less attractive.

I also highly recommend chatting with your network to understand common types of problems. Once you've identified a couple classes of problems and personas for whom your story resonates, then you can decide what approach to take.

Best of luck!

The persona is, someone who knows what are they doing but need someone to actually automate their work routine. I.e. maybe it’s a crypto trader that makes decisions on signals interpretation so they can create a trading bot that executes on their method. Maybe its a compliance who needs automate some routine like checking details further when some conditions arise. Or maybe a social media manager that needs to moderate their channels.Maybe someone who needs a tool for monitoring HN that specific way?

Thanks for the advice! I’m at a stage where I want to have such tool and see who else wants it. Not sure yet about it’s viability as a business and what is the exact market. Maybe I will find out by putting it into the wild and that’s why I consider to release it as a mobile app first.

That's too broad. You aren't going to get any nibbles.

You need to narrow it down to a single and specific persona and business domain.

This is because it takes years to fully flesh out and productionize a workflow from scratch, so concentrating on a business domain you know intimately well helps you build that muscle, which you can then repeat if you are able to hit revenue metrics for a Series A/B.

That persona still sounds too generic, too unfocused.

But even with that persona, it should already answer your question whether posting on HN and producthunt should be a core part of your strategy. Not a lot of social media managers or compliance people around here. And even for crypto traders there are better places to pitch products to them

> every company has someone monitoring HN like a hawk.

Monitoring specific user accounts or keywords? Is this typically done by a social media reputation management service?

And it will get worse once the UX people get ahold of it.
You got that right . .. imagine AI making more keyboard shortcuts, "helping" wayland move off X more so, new window transistions, overhauling htmx ... it'll be hell+ on earth.
We can indeed only imagine. For now, AI has been a curse for open source projects.
A neighbour of me has a PhD and is working in research at a hospital. He is super smart.

Last time he said: "yes yes I know about ChatGPT, but I do not use it at work or home."

Therefore, most people wont even know about Gemini, Grok or even Claude.

He said he know about it and your conclusion is that he doesnt know about the other ones...
No, my conclusion is that if even smart people do not elaborate on any usage of AI (regardless which brand they put) is a clear sign that we are not talking about "mass adoption"
> I see people thinking modern AI models can’t generate working code.

Really? Can you show any examples of someone claiming AI models cannot generate working code? I haven't seen anyone make that claim in years, even from the most skeptical critics.

I've seen it said plenty of the times that the code might work eventually (after several cycles of prompting and testing), but even then the code you get might not be something you'd want to maintain, and it might contain bugs and security issues that don't (at least initially) seem to impact its ability to do whatever it was written to do but which could cause problems later.
Yeah but that's a completely different thing.
And really the problem isn’t that it can’t make working code, the problem is that it’ll never get the kind of context that is in your brain.

I started working today on a project I hadn’t touched in a while but I now needed to as it was involved in an incident where I needed to address some shortcomings. I knew the fix I needed to do but I went about my usual AI assisted workflow because of course I’m lazy the last thing I want to do is interrupt my normal work to fix this stupid problem.

The AI doesn’t know anything about the full scope of all the things in my head about my company’s environment and the information I need to convey to it. I can give it a lot of instructions but it’s impossible to write out everything in my head across multiple systems.

The AI did write working code, but despite writing the code way faster than me, it made small but critical mistakes that I wouldn’t have made on my first draft.

For example, it just added in a command flag that I knew that it didn’t need, and it actually probably should have known it, too. Basically it changed a line of code that it didn’t need to touch.

It also didn’t realize that the curled URL was going to redirect so we needed an -L flag. Maybe it should have but my brain knew it already.

It also misinterpreted some changes in direction that a human never would have. It confused my local repository for the remote one because I originally thought I was going to set a mirror, but I changed plans and used a manual package upload to curl from. So it out the remote URL in some places where the local one should have been.

Finally, it seems to have just created some strange text gore while editing the readme where it deleted existing content for seemingly no reason other than some kind of readline snafu.

So yes it produced very fast great code that would have taken me way longer to do, but I had to go back and consume a very similar amount of time to fix so many things that I might as well have just done it manually.

But hey I’m glad my company is paying $XX/month for my lazy workday machine.

>>The AI doesn’t know anything about the full scope of all the things in my head about my company’s environment and the information I need to convey to it.<<

This is your problem: How should it know if you do not provide it?

Use Claude - in the pro version you can submit files for each project which are setting the context: This can be files, source code, SQL scripts, screenshots whatever - then the output will be based on your context given by providing these files.

Is this process of brain dumping faster than me just writing the code?

If I was truly going to automate this one-time task I would have to give the AI access to my browser or an API token for the repository provider, so I’m either giving it dangerous modification capability via browser automation or I’m spending even more time setting up API access and trusting that it actually knows how to interact with the service via API calls.

My company doesn’t provide Claude, they give me GitHub Copilot Pro or whatever it’s called, and when I provided the website it needed to get the RPM files I was working with it didn’t actually do anything with it. It just wrote a readme file that told me what to do. Like I mention it also just eventually mistook the remote repository as my local internal repository.

And one of the specific commands it screwed up was in my existing script and was already correct, it just decided to change it for no discernible reason. I didn’t ask it to do anything related to that particular line.

With such a high error rate, I would be hesitant to actually integrate AI to other systems to try to achieve a more fully automated workflow.

And your problem is that you didnt understood the point of their post. The full context was so complex and would be so time consuming to relay that they might as well code themselves.
Depends what they mean. Generate working code all the time or after going a few iterations of trying and promoting? It can very easily happen, that an LLM generates something that is a straight error, because it hallucinates some keyword argument or something like that, which doesn't actually exist. Only happened to me yesterday. So going from that, no, they are still not able to generate working code all the time. Especially, when the basis is a shoddy-made library itself, that is simply missing something required.
10 days ago someone was making this claim about copilot on legacy code: https://news.ycombinator.com/item?id=46932609

> Github Copilot has been great in getting that code coverage up marginally but ass otherwise.

That's a completely different claim. Or do you think an AI can always, without fail, produce working code in every situation? That's trivially false.
It's also trivially true that an AI has at least once been able to write a working hello world.

When someone claims that AI can't generate working code I assume that it means consistently generating working code. We're talking about a tool. It has to work more often than not and on codebases that we tend to work with, i.e legacy code.

Personally I don't claim that because I'm using everyday to generate working code.

I'll claim it. They can't generate working code for the things I am working on. They seem to be too complex or in languages that are too niche.

They can do a tolerable job with super popular /simple things like web dev and Python. It really depends on what you're doing.

Scroll up a few comments where someone said Claude is generating errors over and over again and that Claude cant work according to code guidelines etc :-))
That not the same.