Hacker News new | ask | show | jobs
by hansvm 39 days ago
I haven't yet seen anyone with a concrete example project (public ideally, but even describing private efforts in enough detail to enable potential criticism would be fine) making a claim as strong as 10x. Are you willing to break the mould and show us what we're all missing?
14 comments

10x what? 10x revenue? 10x features shipped? Whats the measure, is it 10x speed of dev like parent comment? Because an unqualified 10x could mean 10x SLOC which is trivial with an agent but has negative value.

Assuming 10x on the speed of dev, Is the vscode repo a decent example? Recently they've been all in on AI augmented development so i'm thinking they'd be a reasonable subject?

How do you isolate out what counts as the "development" part of their delivery cycle (is that the dev inner loop, does that show up in frequency of commits then?) to measure it and see if it's running 10x?

https://github.com/microsoft/vscode/graphs/contributors?from...

Guarantee it’s the same story I get from all my friends/co workers who are now 10x… they are 10 times faster at starting random projects that get to 80% done that they can’t finish, so they immediately move on to the next project because their velocity is so high
From a software quality or software engineering POV --> this is clearly not building durable value, not scalable, etc. So I'd agree with you there.

But from the POV of say, a young startup company looking for PMF and navigating the ambiguities involved with trying to figure out what is the "right thing" to build that will appeal/delight/convince-people-to-pay --> being 10X faster at shipping 80% done projects, is actually incredibly, unfathomably valuable of a superpower. And it is also rationally the "right thing" to do, to make lots of cheap bets and fail/learn fast.

I find that many folks on my team (I am a manager/leader of small-to-mid size eng org), struggle with accepting the nuances of knowing the difference between different projects (where same team may need to do both kinds of work, all the time and in parallel):

- "Hey, the company needs, and you and I both agree, that this situation calls for building/renovating a skyscraper --> please design a fucking strong/safe/reliable skyscraper and don't take any shortcuts, this requires 'real' engineering"

- Vs, "Hey, the company isn't sure what it needs, and neither you nor I know any better either, so let's try a bunch of different shacks/sheds/treehouses/whatever, until we find something that has traction / makes us money (and it's okay if the shed collapses -- so long as the business knows this too, that it wasn't meant to be a load-bearing, skyscraper-esque thing anyways)"

I won't get into the rabbit hole of talking about dealing with bad business leaders, who want a skyscraper but expect to pay the price of a shack/shed. Let's assume that we are talking about the type of companies (maybe the minority) that are reasonable enough to know and acknowledge the difference. Then what is the game-theoretic/rational thing for them to do, and how does this 10X idea express itself? That's where my argument is coming from.

Its 10x code generation with .5x quality at best and all other parts of the SDLC are at 1.x or worse.

AI is not delivering 10x shareholder value, anywhere. Software developers have quite the level of hubris about how important they are to companies. Yes our work is very complex and takes a certain mindset to do it well. It takes a lot of other roles to have a successful business, many of those roles will use AI to help draft slide decks, emails, etc. and that's the limit for them.

Look at recent companies doing layoffs claiming its because of AI, like CloudFlare and Coinbase, do their reported financials paint the picture that they are crushing it with AI? No, its net losses into the $100's of millions.

> AI is not delivering 10x shareholder value, anywhere.

A bit facetious, but I'd expect Nvidia and the like providing the "AI equipment" to have a 10× share value at least…

As always, the real money in a gold rush goes to the people selling shovels.
It's more like ∞x (or N/Ax if you prefer) because the majority of the projects I did with LLM agents wouldn't have existed without them, because I would've never found enough time to work on them.

One of the latest things I made with Claude was a tool that allowed me to move a bunch of very low traffic Cloud Run services to a single VPS without losing any of the Cloud Run benefits such as easy Docker-based deployment and automatic certificate provisioning. I thought about making something like that for quite some time, and Claude finally made it possible, which makes me quite happy.

The fun thing here is that no other soul genuinely cares about it, or any other code I might publish. The code, especially AI generated, is so cheap that if anyone wants to repeat my steps to get rid of Cloud Run services, they will probably vibe-code their own tool instead of figuring out how to use mine, just like I did that instead of spending time on learning Dokku or similar solutions.

So, yes, 10x and more, but no one cares about the result, which makes the whole 10x measurement less useful.

There have been plenty of libraries/tools that got people from "I would like to do foo" to "oh, this tool makes foo possible in under 10 hours, I should start working on foo" in the past. LLMs have done this for many foos, which is great.

But I'm with hansvm - I haven't actually seen anyone plausibly maintain 10x. 10x is different from getting people past their activation cost.

>or N/Ax if you prefer

It's not a matter of preference

The incredulity at 10x claims is often unearned because how much do these skeptics actually notice and appreciate the depth of work of ten developers collaborating on something (if not their own org)? Dev output slips by quietly. There are reams of unnoticed projects even at the scale of a life’s work.
This doesn't pass a sniff test at any small organization. And wouldn't these devs see this 10x claim themselves?
I'm assuming the devs are seeing 10x code generation and equating that to the improvement.

It's when they practically ignore the rabbit holes where it's suspect. I'm definitely seeing speed ups. I troubleshot a linux system yesterday with minimal effort using a local llm. It likely would have taken me a few hours to locate all the docs & testing procedures. the llm did it with only a few prompts. To ensure it did it correctly, I had to interrogate it a few times before letting it proceed.

Humans make really bad scientists, and it takes a lot of effort to properly catalog and provide statistics for these things.

There is an improvement, but I doubt any random dev can give a real estimate since before LLMs they couldnt really give you a real estimate anyway. I do know when I encounter a bug now, debugging is almost immediately possible.

My small organization is noticing output increasing. We're excited about it. I’m not sure about 10x… Like others have mentioned, it’s difficult because you have to measure different workloads.

I build things I never would have. My tooling is better and more robust than ever. I verify and test my work better than ever. I fix more bugs than I used to simply because no one needs to care if it fits into a cycle. I explore and solve more problems in more parts of the application, even if I don’t write code. I take better care of our infrastructure. Performance goes up, bugs go down, AWS resources scale back, costs go down. I’ve paid for my AI usage in scaled back resources several times over at this point.

It might not be 10x but it’s a significant multiple.

I am building a better interface for managing KNX systems than the ETS6 software. Code is here: https://github.com/jgrahamc/koolenex

1. I would not have attempted this without AI assistance because it's a big project.

2. I have built a functional program that I am able to use for real work in a handful of weeks, working part time on this (like literally a few hours per day prompting Claude and Kimi).

3. Had I decided to do this without AI assistance it would have been months of work.

I suspect that the goalposts for AI-assisted coding will be moved the same way they've been moved for the Turing Test.

The Turing Test used to matter until it didn't (does anyone even talk about it? was there a big news conference when it was solved?). Likewise every time it becomes easier to ship software, the bar will be pushed higher by sceptics. Ultimately the gatekeeping is going to become meaningless as software becomes "too cheap to meter".

Here is a public project.

https://github.com/KeibiSoft/KeibiDrop

It took me 2 years ago around 2k hours to build a cross platform FUSE vault, without using AI assisted tools.

The pain was debugging through logs and system traces. And understanding how things work.

Now managed to ship this one much faster, as an after hours project. Started it in may 2025, and around end of November 2025 started using claude on it.

Just by dumping logs into claude, and explaining the attack vector for the problems, saved me the FML moments of grindings walls of syscalls on 3 platforms.

I would say much easier to progress, and ship with the same rigour, minimize my time, focus and brain power involvement such that I can put the energy somewhere else.

Yep, the real strength of AI is less in replacing engineering skills, it's more in slashing all the time we spend not using those skills and doing low level research and data correlation tasks instead. Which isn't to say that those tasks aren't valuable in their own way, but in terms of raw output...
I long for the day when they will supervise CI/CD systems.

Trying to fix syntax errors in strong interpolation on a 5-minute-delay loop is hell.

Just create a skill for it -> I call mine `babysit`. It spins up a subagent that polls it every x minutes and auto-fixes it until it's green. I already continue with the next task while it does that in the background
I do this with our AI PR review checks. We have AI review every PR and commits to PRs... which can cause long running loops of commit<>fix.

So my agent just listens for green checks and no PR comments and loops until those conditions are met.

About 30% of our AI PR review checks are flawed at a fundamental level, based on poor comprehension of the whole. If I told the AI "fix everything and keep going until it's green" I would be terrified of the result.

I disbelieve this works in anything other than a toy codebase (or an incredibly fine-grained microservice).

The 70% is amazing! But a 30% failure rate requires intense supervision.

It is possible. I tell to use cli app, and for the agent to ad timer and check the status once in a while. Especially if there is something with a long wait. Also if it can run some validators/ same tools locally, would be much faster.

Might tend to deviate and waste time, needs guiding once in a while, and to check what is it spewing out, point it in the correct direction.

I treat the low level tasks as building blocks. You need a grasp and understanding of what is possible with them, but you do not need to remember the exact byte order and syntax. I think the idea is you should structure your workflow in a deterministic way, and just use Claude/ LLM as the interface. It is much easier and enjoyable to use high level language, where you give pointers to building blocks/ directions/ say hard no when you understand things deviate.

If I had to output the code myself, would take around 8 hours of constant writing to get around 1k LoC of code. For FUSE level tricky stuff, I might need to spend 3 weeks for 10 LoC. Very easy to burnout and build pain.

My friend who has never taken a programming class (or even touched an IDE before AFAIK) has now put a small app into production.

Complete frontend + backend + database.

Yes, it is an internal app, but it works and everyone loves it.

Does that count as an example?

(Also I absolutely expect him to need help at some point, but so far it has taken his project from absolutely impossible to 3 weeks of work in between work, renovating his house and being a dad for the first time so I was very impressed.)

These kind of things (internal tools created out of band of normal engineering practices by non-engineers) were amazing back when I did pentesting because the security was always the last consideration. That got harder when SaaS became preferable to rolling your own stuff for everything. Guess things are gonna get fun again for red teams lol
I agree with you.

The danger is not however that only that people write their own tools for calculations and capacity planning etc.

The danger is people make useful stuff that is very fine as long it is just an internal tool, but then someone add credentials to other systems so it can access and maybe even update stuff and it gets exposed to third parties etc and all of a sudden we have a major data breach going on.

My friend who has never taken a programming class did the same 10 years ago.
Obviously it was not the same s since models that could write programs were not available 10 years ago.

And if you just mean your friend taught himself programming on his own, well that is actually very cool, I did too back in the 90ies and so did many others here.

My point is that it is now possible to vibe code a full application from frontend to backend today and still not be able to understand a line of TypeScript or anything else.

20 years ago we had cpanel where people without programming knowledge can instantly deploy their own apps. Many people got millionaire doing that, I'd say proportionally probably way more than the people vibecoding solutions. This is not as revolutionary as we thought.
I'm building an open source Google Photos alternative. Have a look at my project and tell me if you think you could do all this at the same speed without an LLM: https://opennoodle.de

Direct github link: https://github.com/open-noodle/gallery

It doesn't completely blow away your argument, but you forked Immich and gave it to a LLM. Which is arguably slightly easier than starting from scratch.

Nothing wrong with forks though.

Sounds like you're nitpicking on whether or not LLMs make you faster by saying that working on existing code is easier than starting from scratch? I think you've missed the point.
No, they're saying that it was misleading. OP said "I'm building an open source Google Photos alternative" and surprisingly didn't say "based on Immich". This dramatically changes the evaluation of "open-noodle".

We're now in an era where LoC is easy and design is hard[1]. Starting with an existing project means using an existing design, where someone else has already made many/most of the difficult decisions.

10Xing code without caring about design/UX/DX is trivial. Literally anybody with a token budget can do it. But they probably won't ship a good project. Not with current frontier models.

[1]: design has always been hard. But now it's even more difficult because of code veloocity and because LLMs are happier to work with bad code than humans. It's never been easier to go deep into rabbit holes without noticing a single issue.

Pretty much every example in this thread is "I forked some existing project and made changes I like".

The main thing they dont realize is: 1. These are mostly superficial changes. 2. The only thing they 10xed is their ability to "start" on something. 3. They have not produced actual value. Their project/fork is just a version they think they prefer. But It is less maintainable, and less robust/useful for others due to its specificity.

My observations is that consistently these arguments are made by: inexperienced devs who simply dont understand what it takes to produce value in the real world.

LLMs CAN 10x you (in very specific areas like prototyping), IF you understand how to deliver this value, but that is the hard part. It has always been the hard part.

What are you debating here? I replied to a comment asking for examples of public work displaying that LLMs can 10x a human's output.

https://opennoodle.de/roadmap/

Look at what I built, these are not all simple designs.

Just that you're treating LoC and complexity as useful metrics, when these days those come for free. Or for $3 per Mtok, almost free.
I have a programming language project: lone lisp. It's on GitHub. I asked Claude to analyze the commit history before and after AI and the result was: with AI I'm committing around 8x faster than before. Lone experienced a literal explosion of features, improvements and fixes after I subscribed to Claude.

And this is the project I care most about. I review every single line of code the AI outputs. I push back on everything it does. I reword every commit message. I make every effort to understand how everything works before committing it to master. I cook branches for days and I don't merge until I think it's perfect.

Still gave me an ~8x improvement.

The latest development: shaped objects, like Self and V8. I asked Claude about it and it just implemented it in like 10 minutes. Boom, instant ~20% speed improvement. I read the code and it turned out to be almost obnoxiously simple. It basically converts hash tables into arrays internally, and deoptimizes back to a hash table if anyone deletes keys from it. I'm still reeling from the sheer absurdity of it.

It is worth to have a detailed look into the original essay and its arguments. My interpretation is https://smartmic.bearblog.dev/no-ai-silver-bullet/
I can describe private efforts about a couple recent projects that made me finally believe that Claude Code can actually be a 10x multiplier on certain work.

We decided to integrate our SaaS into Microsoft Business Central and NetSuite as plugins into those systems. BC has its own programming language, called AL, that has a lot of idiosyncrasies from any other language I've worked with. And NetSuite plugins are written in SuiteScript, which is a custom JS runtime with a ton of APIs to learn.

In the "before", it would've taken 5 developers a year or more to build those integrations. I did both by myself in well under a year. Thank you Claude.

I can state one thing that I'm sure a lot of people connect with, and I don't know if this is 10x, probably not.

I've always been a backend engineer, never front end. And almost every team I've been on has lacked any front end skills at all, so all our tools end up being a mash of scripts, maybe sometimes an API.

Now we are all front end engineers creating UIs for things we could never do before, and this starts API first development, so the CLI + UI are just calling APIs. Nothing new here, but this used to be what teams do, now a single person does it.

It would not have taken you long to learn frontend if you wanted to. Now you can use AI to generate it but you don't understand any of the generated code.
It's often much more than 10x, because AI makes things so much faster and easier that we are now doing many things we wouldn't have bothered to do at all before. And there's no reason to believe we're near the finish line yet as to how good LLMs can get.
It's a 10x in areas where I'm not very good/experimented at. Or perhaps even more. For example I'm mostly a backend developer, I used to be a full stack dev a long time ago so I know how to make front-end but my knowledge there is outdated, I've forgotten half of it, it takes me time to get back to it. I'm also not good at creating a web UI so in the past I was using frameworks like Bootstrap or Foundation, leading to sites that looked good but very generic.

Now with AI, I can easily create a nice looking front-heavy web app.

See this for example: https://github.com/erwan/sovereign-cards-database

I would never have bothered doing that without AI.

In areas I'm more familiar with, like back-end software, it's maybe more of the 2x or 3x.

There are some things around the corner. The best engineers aren't going 10x on Claude to build the next goofy SaaS app.
There is verifiably a flood of new projects. The selfhosted subreddit had to put in checks coz too many people were complaining about vibecoded slop.

iOS submissions are way up.

What I havent seen is existing open source people develop 10x faster.

Or, anything vibe coded get popular which wasnt just a gimmick.

Just a tsunami of slop.