| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wokwokwok 709 days ago

> I've written a lot of tests, I think this would have taken 3-4x longer to do by hand. Surely an hour?

I guess my point is I'm skeptical.

I don't believe what you had the end would have taken you that long to do by hand. I don't believe it would have taken an hour. It certainly would not have taken me or anyone on my team that long.

I feel like you're projecting that, if you scale this process, so say, having 5 LLMs running in parallel, then what you would get is you spending maybe 20% more time reviewing 5x PRs instead of 1x PR, but getting 5x as much stuff done in the end.

Which may be true.

...but, and this is really my point: It's not true, in this example. It's not true in any examples I've seen.

It feels like it might be true in the near-moderate future, but there are a lot of underlying assumptions that is based on:

- LLMs get faster (probably)

- LLMs get more accurate and less prone to errors (???)

- LLMs get more context size without going crazy (???)

- The marginal cost of doing N x code reviews is < the cost of just writing code N times (???)

These are assumptions that... well, who knows? Maybe? ...but right now? Like, today?

The problem is: If it was actually making people more productive then we would see evidence of it. Like, actual concrete examples of people having 10 LLMs building systems for them.

...but what we do see, is people doing things like this, which seem like (to me at least), either worse or on-par with just doing the same work by hand.

A different workflow, certainly; but not obviously better.

LLMs appear to have an immediate right now disruptive impact on particular domains, like, say, learning, where its extremely clear that having a wise coding assistant to help you gain simple cross domain knowledge is highly impactful (look at stack overflow); but despite all the hand waving and all the people talking about it, the actual concrete evidence of a 'Devin' that actually builds software or even meaningfully improves programmer productivity (not 'is a tool that gives some marginal benefit to existing autocomplete'; actually improves productivity) is ...

...simply absent.

I find that problematic, and it makes me skeptical of grand claims.

Grand claims require concrete tangible evidence.

I've no doubt that you've got a workflow that works for you, and thanks for sharing it. :) ...I just don't think its really compelling, currently, to work that way for most people; I don't think you can reasonably argue it's more productive, or more effective, based on what I've actually seen.