Hacker News new | ask | show | jobs
by maccard 16 days ago
> Let’s face it: by the time I manually ship version 1.0 of a product, the AI-assisted version could have been deployed 10x faster.

Show the receipts. Where are the mobile apps, the photoshop replacements, the video and audio editors, the games and game engines that took a decade to make in the past that have shipped since Claude code came along?

> By then, enough real-world feedback would have surfaced to identify the major issues, and tools like Claude Code would make it possible to fix and ship version 2.0 at an incredible pace.

Again where are the receipts?

My experience with coding agents is that they’re perfectly good at generating a v0.1 that just about passes the sniff test. It does the first 90%, but the second 90% always takes longer than the first 90%. That second 90% is what coding agents are terriblle at, and are what make actually good products.

6 comments

Being 10x (or whatever multiplier) faster at programming doesn't mean you're going to be 10x faster in designing a product or any other aspect that goes into making a good product.

Even if you hired an actual programmer, it'd take a massive amount of time to build a Photoshop clone.

Of course, at the end Photoshop is lines of code and it could be output as is, end to end. One problem is that users aren't generally giving very precise design documents which would narrow the way to interpret them into code in precisely one way. Or that a design document at any level of precision, other than code, couldn't be interpreted in multiple ways when it comes to a specific implementation.

LLMs also take a relatively long time to output acceptable code, often taking tens of minutes before giving you a small diff. The larger the codebase, the longer it usually takes to start producing code, even over an hour.

> Of course, at the end Photoshop is lines of code and it could be output as is, end to end

And yet it’s not.

> LLMs also take a relatively long time to output acceptable code, often taking tens of minutes before giving you a small diff. The larger the codebase, the longer it usually takes to start producing code, even over an hour.

The problem is that they don’t generate acceptable code, they generate code that needs to be edited to be acceptable. That has always been the slow part of engineering. Waiting an hour for a bugfix even if it cost $75 in tokens would be cheaper than hiring an engineer but only if it worked. And it’s a bit like hiring a snake oil salesman - it passes the sniff test but it’s only when you’re drowning in the fact that your ai now takes 4 hours to fix the same bugs, and it introduces new bugs _and_ you don’t have anyone who can reduce that complexity that you see the reality. For a lot of us, that is immediately clear from first glance at the output of Claude and codex and the likes.

> And yet it’s not.

That was my point as well. That it hasn't been output, even though it could be done by a talented solo developer given enough time, and that current LLMs definitely aren't able to do so.

> The problem is that they don’t generate acceptable code, they generate code that needs to be edited to be acceptable.

You've never had an LLM output a one line bugfix that is correct to the point where you don't have to edit it?

To make things more concrete, here's an example from the creator of Redis on how he utilizes LLMs in programming: https://antirez.com/news/164

> You've never had an LLM output a one line bugfix that is correct to the point where you don't have to edit it?

I have. I’ve also had IDEs and static analysers do the same thing. I can also take my car out of gear and have it roll down a hill but that doesn’t mean it can run without fuel. Only a sith deals in absolutes, and in the general cases LLMs don’t generate acceptable code.

My experience is that when I ask for a clear and well defined problem, of the scale of "add motion blur (linear, spin, and zoom) to the filters menu; include standard dialog box (see existing design) for user input on all options", this works something like 90% of the time, is obvious garbage 5% of the time (in my experience, when it claimed to be writing "unit tests" it was actually performing regexes on the source code), and is subtly wrong the other 5% of the time.

If you use the planning mode, and your first move in the project is "write plan to reimplement photoshop" then you blindly say "continue" until the plan is done, then you get 0.9^{number of features} success, which of course on the scale of photoshop is going to be a failure. But this is still in one sense a 10x speedup in that 9 times out of 10 you're only doing code review, not having to re-write it. But code review is a real thing, so it's 10x on writing code not 10x on delivery.

Try it out, like seriously, learn to use it well. Spent a few weeks with it. You will not say these things if you were an experienced user of these tools. Saying their code is "unacceptable" is a skill issue. Describe what you consider "acceptable code" and watch it produce it in copious amounts. They don't have one mode, one setting, they can generate whatever the F you want in whatever style you deem "acceptable". You're completely in control.

That said, I watched many of my - generally pretty clever - colleagues struggle mightily with this. I can't put my finger on it yet. Regular "programming" - typing BS syntax one character at a time - always felt astronomically boring to me so I'm one of the guys happy with these tools. Not happy with how it will fuck up society though, but that's uh.. yeah.

As always in these discussions, I think people compare apples and oranges. LLMs are great with in-distribution solutions for solved problems with a lot of relevant prior material and in established technologies. Frontend stuff works great, for instance.

But for novel solutions, complex business logic, things deeply integrated with external systems... the code generation quickly turns terrible and useless. Especially if it's in anything but Python, Java, or JS.

Most of these differences in results end up being about differences in application. LLMs suck at out-of-distribution material, inherently.

That’s what I find funny about „end of SaaS”.

If you are a graphic designer you are not going to make your own Photoshop. Even if you could ROI is not there.

Graphics designers of course are bad example because everyone will just generate images directly from LLMs.

But restaurant owner for example could build his own website with menu - heck even with just slapping html without LLM making a decent website was easy but these people didn’t have time for that.

I work in insurtech, wet dream of all big companies dealing with insurance was customer self service so they were building those interactive forms - but no customer wants to do the filling, they don’t care they don’t have time they are busy running their own business and they want to call or meet with someone who knows what needs to be filled in those forms. Chat with AI is not fixing that because business owner will have to spend his time answering all these questions that were in the form but only now it will be chatbot.

It's definitely not at the make me a photoshop stage but I don't think that angle explains adequately explains the tone of the anti-AI hn discussions taking place at all.

If it was just a toy with no shot at making something real people would go "oh cool have fun with that" and move on with their life. Instead we see pretty emotionally charged posts.

What stage is it at then? Because every new model is met with people saying they haven’t opened an editor in a year, or how they’re 10x more productive. But I’m not seeing 10x more tickets closed, 10x more bugs fixed, or 10x more feature designs that we can smash through. I see verbosity and noise when used by the people who were behind, and I see the same level of quality and excellence from people who use it like it’s another tool in their toolbelt
>I see the same level of quality and excellence from people who use it like it’s another tool in their toolbelt

It's certainly currently an edge if you know what to tell the mystery box of magic in precise terms.

I don't think this distinction is going to endure though - every level of "it can't do it" has fallen and generally faster & more decisively than predicted. We started with it printing hello world, to autocomplete where you still needed to be able to know what the line should do, to autocompleting functions, to writing entire units, to working out architecture tradeoffs, to doing research planning architecture execution and testing all autonomously. That trajectory plus people retreating to nebulous "I'm adding taste" tells me this is going to sail straight past "tool in toolbelt" territory at Mach 10.

Everyone has their own perspective but to me "show me the receipt" at a specific point in time is a completely wrong lens for a tech that shows clear signs of exponential improvement (i.e. https://metr.org/ ).

I’ll agree it’s an edge. But edges aren’t worth the GDP of a middle sized European country.

> every level of "it can't do it" has fallen and generally faster & more decisively than predicted

I disagree. The agent + harness model was a huge leap and really moved the bar. The tools became genuinely useful for coding very quickly.

> Everyone has their own perspective but to me "show me the receipt" at a specific point in time is a completely wrong lens for a tech that shows clear signs of exponential improvement

At some point, the exponential improvements have to show results or it’s just a Ponzi scheme.

I get that results in your mind is a vibecoded full feature photoshop as the goal post, which is fair enough. That's no less arbitrary than say me defining the goal post as can it make me a useful script & thus has already delivered results with receipts.

That's why I'm saying this whole look at one point in time logic isn't useful here. Depending on where you set the cutoff you get diametrically opposed answers.

>Ponzi scheme

The IPO & bubble financial shenanigans are only loosely linked to the technological advancement. Tech genie is out of the bottle, people are intrigued and people with ability to tinker on this are spread globally. Even if the entirety of western tech & financial sphere disappeared tomorrow tech progress here would wobble and slow not stop.

Even the one global nexus where progress could have been killed globally - Taiwan - is looking like it'll de-risk shortly between CXMT and SMIC's rapid progress.

In my mind the base case assumption here has to be that the trend (that has been remarkably consistent) continues until proven otherwise. Stack enough improvements on top of each other and instead of a useful script you shall have your photoshop. Maybe...

> I get that results in your mind is a vibecoded full feature photoshop as the goal post, which is fair enough. That's no less arbitrary than say me defining the goal post as can it make me a useful script & thus has already delivered results with receipts.

I think vibe coded photoshop is what’s being talked about from Anthropic, OpenAI as the end goal - Dario is on record saying that AI will replace engineers and Sam Altman has said “ we will never ever write code by hand again. It doesn't make any sense to do so”. The cursor founder has said that he ships 10x what his other engineers ship. I want to know where people _genuinely_ think the cutoff is, because there’s a lot of talk about where it’s not, and that’s gmoving the goalposts.

> The IPO & bubble financial shenanigans are only loosely linked to the technological advancement.

Except they’re not. Someone is paying for this compute power, and energy.

> In my mind the base case assumption here has to be that the trend (that has been remarkably consistent)

Making wild promises, and insane promises about capability and then doing the same thing 3 months later when a new model releases? Isn’t it convenient that both OpenAI and Anthropic have models that are “too powerful” to release. How responsible of them.

At the same time; Anthropic, OpenAI and Copilot have all changed to usage based to billing recently for enterprises as they’ve been been undercharging by 10/100/1000x in many cases. Enterprises are limiting costs (uber limiting spend to $1500/mo this week).

If these tools were really game changing and integer productivity multipliers why aren’t major engineering organisations spending all their hiring grown on these tools and getting ahead of their competitors? Because they’re “not quite ready” just like they weren’t 6 months ago, and they still won’t be in 6 months.

I can't speak for others, but my main emotional issue with it is how it has addled the minds of my employers. If they would judge on output and I couldn't keep up objectively, ok fine I'll use AI. But they judge on usage even if output is worse! Anyone would be upset if forced to abandon their skills and work in a different way that produces worse results and is less enjoyable. It's just stupid.
That’s because it’s really hard to evaluate results but usage is comparable across people. It’s the equivalent of counting lines of code submitted or jira tickets closed.
"If it was just a toy with no shot at making something real"

This just isn't true. It isn't about what the LLM can do, its about what the executives think the LLM can do that's the problem.

If people just wanted to make their own stuff and have fun, that's fine. Knock yourself out.

However, it's launched this enormous tidal wave of mediocrity that emboldens the dumbest people to do the dumbest shit and make it my problem. I just had to yell at one of the IT guys for trying to hook up our Duo to Claude, and I'm still mad about it lol.

Stupid people are always going to do stupid shit. I don't think that makes the enabling technology the problem & the anger towards it is misdirected.

>trying to hook up our Duo to Claude

ngl that is hilarious

Yeah, but the idiots had to put in effort before, which was difficult for them for obvious reasons.

Now they have a cheerful idiot robot who can actually do the idiot things they dream up and it tells them how brilliant they are.

Is it this: GitLab Duo Agent Platform with Claude accelerates development [1]? Well, Gitlab itself promotes hooking them up!

[1] https://about.gitlab.com/blog/gitlab-duo-agent-platform-with...

No, Duo as in Cisco Duo IAM.

It was a "please do not fuck around with a central pillar of our security infrastructure just because you can" thing lol.

What's a Duo?
Photoshop alternatives already exist and are cheap or free, like Affinity.

Why create a Final Cut Pro alternative when it’s so cheap from Apple?

Many existing apps can be mostly cloned by a small team over 6 months or a year, but the challenge is finding customers willing to switch. You still need to add something new and useful, then reach customers somehow.

How confident are you this is true?

"Many existing apps can be mostly cloned by a small team over 6 months or a year,"

I have a vision of re-imagining stuff that is currently done across 5 apps, done within one that is far simpler and with favourable economics. Most people wont figure this kind of stuff out - it requires a lot of imagination. And if done in stealth - the incumbents will be fcked.

My belief is that we are gonna see a lot of consolidation - things that used to be done across multiple apps within one. Whilst at the same time a rise in apps that do one thing really really well - think of it as being closer to the ideal product where people current face a mismatch cost. This will remove the existence of most 'general' apps - which is a good thing imo.

> Show the receipts. Where are the mobile apps, the photoshop replacements, the video and audio editors, the games and game engines that took a decade to make in the past that have shipped since Claude code came along?

For code in general, the various meanings of "I am {insert number here} times as productive" on Figure 9 page 36: https://www.nber.org/system/files/working_papers/w35275/w352...

Same document, Figure 12 page 41 shows a significant spike in iOS apps but also the users per app is way down (which you should expect, given this makes it possible for low-user-count apps to be sensible business propositions).

How many people care to spend a year making a replacement for something that took a decade? Photoshop, despite the complaints about price and subscription model, just isn't expensive enough to justify one engineer-year to replace. Unreal and Unity are free for a lot of people using them, and likewise are not worth the cost of replacing for those who do end up paying (because the teams using them know how to use them and don't want to be retrained).

For this reason, you should be looking at things which would have taken a year of human time but now take a month, or faster, so less Photoshop 27.7 (today) and more back when it was still called ImagePro (1988).

For games, I've seen games like the following take a month or more to get good enough to be interesting, and yet the following took me two prompts, the second of which was the single word "continue" (and only needed that because this was on the free tier and I used too many tokens); I didn't bother to look at the code, I don't care about the code, making the app itself was as easy as simply finding an app like this on the Apple App Store even though such an app was in one of the top-10 lists: https://github.com/BenWheatley/Piano-Trainer

Is this game "as good as" the one on the App Store? Who cares. Any random person who wants their own app can now get their own custom version doing the specific things they care about, which doesn't need to simultaneously support all the use cases of all the other people who would buy the app on the App Store.

From what I read on Hacker News comments, the same is happening with video editing, where it's not "Make an iMovie clone" (why would you, iMovie is free), but rather every time you need one specific thing, you ask your LLM of choice for a solution, and it gives you a shell script which calls ffmpeg with the right arguments.

There’s been a massive spike in released code and apps in the last year. If you’re asking for receipts and expecting none then you’re not paying attention.