Hacker News new | ask | show | jobs
by obirunda 624 days ago
I think at its core it's not that there isn't value or future value, but currently there is an assertion, maybe some blind faith, that it's inevitable that a future version will deliver a free lunch for society.

I think the testimonies often repeated by coders that use these code completion tools is that "it saved me X amount of time on this one problem I had, therefore it's great value". The issue is that these all fall into a research of n=1 test subjects. It's only useful information for the subject. It appears we don't realize in these moments that when we use those examples, even to ourselves, we are users reviewing a product, as opposed to validating if our workflow is not just different but objectively better.

The truth lies in the aggregate data of the quality and crucially the speed by which fixes and requirements are being implemented at scale across code bases.

Admittedly, a lot of code is being generated, so I don't think I can say everyone hates it, but until someone can do some real research on this, all we have are product reviews.

1 comments

> I think at its core it's not that there isn't value or future value, but currently there is an assertion, maybe some blind faith, that it's inevitable that a future version will deliver a free lunch for society.

To me it seems very much like we're somewhere near the peak of the hype cycle: https://en.wikipedia.org/wiki/Gartner_hype_cycle

Except in the case of "AI" we get new releases that seem somewhat impressive and therefore extend the duration for which the inflated expectations can survive. For what it's worth, stuff like this is impressive https://news.ycombinator.com/item?id=41693087 (I fed my homepage/blog into it and the results were good, both when it came to the generated content and the quality of speech)

> The truth lies in the aggregate data of the quality and crucially the speed by which fixes and requirements are being implemented at scale across code bases.

Honestly? I think we'll never get that, the same way I cannot convincingly answer "How long will implementing functionality X in application Y with the tech stack Z for developer W take?"

We can't even estimate tasks properly and don't have metrics for specific parts of the work (how much creating a front end takes, how much for a back end API, how much for the schema and DB migrations, how much for connecting everything, adding validations, adding audit, fixing bugs etc.) because in practice nobody splits them up in change management systems like Jira so far, nor are any time tracking solutions sophisticated enough to figure those out and also track how much of the total time is just procrastination or attending to other matters (uncomfortable questions would get asked them, way too metrics would be optimized for).

So the best we can hope for is some vague "It helps me with boilerplate and repeatable code which is most of my enterprise CRUD system by X% and as a result something that would take me Y weeks now takes me Z weeks, based on these specific cases." Get enough of those empirical data points and it starts to look like something useful.

I think lots of borderline scams and/or bad products based on overblown products will get funded but in a decade we'll probably have mostly those sticking around that have actual utility.

The top comment of your HN link is exactly the issue at hand

> don't know what I would use a podcast like this for, but the fact that something like this can be created without human intervention in just a few minutes is jaw dropping

AI has recently gotten good at doing stuff that seems like it should be useful, but the limitations aren’t obvious. Self driving cars, LLM’s, Stable Diffusion etc are awesome tech demos as long as you pick the best output.

The issue is the real world cares a lot more about the worst outcomes. Driving better than 99% of people 24/7 for 6 months and then really fucking up is indistinguishable from being a bad driver. Code generation happens to fit really well because of how people test and debug code not because it’s useful unsupervised.

Currently balancing supervision effort vs time saved depends a great deal on the specific domain and very little about how well the AI has been trained, that’s what is going to kill this hype cycle. Investing an extra 100 Billion training the next generation of LLM isn’t going to move the needles that matter.