Hacker News new | ask | show | jobs
by niwtsol 23 days ago
100% - I think that is also part of the divide you see online. Devs who work on massive codebases w/ 100s of engineers and see the bugs the LLMs create vs devs who work on smaller codebase w/ small <5 person team.
1 comments

It's a tradeoff.

Generating a feature that is 90% correct in a tenth of the time is a reasonable tradeoff if you're trying to gain traction.

Generating a feature that is 90% correct in a tenth of the time, risking a multi-billion-dollar business, is a terrible tradeoff.

I think it's rather:

Small teams building continuously get to write features that are 90% correct in a tenth of the time.

Big enterprises get to write features that are 90% correct barely twice as fast, because all of the bottleneck lies elsewhere. They also spend more on AI per user because of the internal dynamics pushing people to adopt AI irresponsibly. They can correct the 10% of errors slower than small teams because of bureaucracy, increasing the cost of errors that show up in the product. Furthermore, they have less to gain from a given amount of speedup because they had plenty of engineering velocity anyway compared to small teams.

I don't think big enterprises will start winning from AI technology until AI truly can automate almost everything in a company and let said company outproduce competitors by burning tokens alone. That's nowhere near possible right now.

I don't think "90% correct in a tenth the time" is really the tradeoff. For a well-specified task, it's closer to 100% correct in 1/100th the time. But that undersells the time required to generate a good spec.

For under-specified tasks, it's not really accurate to talk about "correctness," because the machine isn't psychic. I would suggest that given a high-level feature request like "add streaming support" it's more about acceptance probability. In a well-structured and well-documented codebase, and a reasonably sized feature request, there might be an 80% chance it will generate something which is 100% acceptable. But there's about a 99% chance it will generate something which is acceptable after 1-2 revisions.