| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Our_Benefactors 268 days ago
	The data showed llms are better. This put debate to rest. Now we are post-debate.

3 comments

JohnFen 268 days ago

What data are you talking about? Why do you value it above the data showing the opposite?

link

snickerbockers 268 days ago

It's superior data because it supports his expectations. His expectations are right because they are based on superior data. Checkmate Luddites.

link

Our_Benefactors 268 days ago

Meanwhile, you have furnished zero data that supports your claims. Ho hum.

link

snickerbockers 267 days ago

Your initial statement is that you are not open to debate so i don't see what the point would be. Furthermore you defined "serious inquiries" as synonymous with your own preconceived ideas so by definition I cannot refute anything you say using a "serious inquiry". Do not interpret this as some sort of complement or concession but it is not possible to argue against you.

Even putting the sophistry aside your argument is incomplete because you never defined what "productivity" means in this context or how it can be quantified. I would never dispute that a pseudo-random bullshit generator can shit out javascript faster than any human, but that's not necessarily productive.

link

lmf4lol 268 days ago

give me one seriously peer reviewed study please with proper controls

i wait

link

Our_Benefactors 268 days ago

Go ahead and move the goalposts now... This took about 2 minutes of research to support the conclusions I know to be true. You can waste time as long as you choose in academia attempting to prove any point, while normal people make real contributions using LLMs.

### An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation We evaluate TESTPILOT using OpenAI’s gpt3.5-turbo LLM on 25 npm packages with a total of 1,684 API functions. The generated tests achieve a median statement coverage of 70.2% and branch coverage of 52.8%. In contrast, the state-of-the feedback-directed JavaScript test generation technique, Nessie, achieves only 51.3% statement coverage and 25.6% branch coverage. - *Link:* [An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation (arXiv)](https://arxiv.org/abs/2302.06527)

---

### Field Experiment – CodeFuse (12-week deployment) - Productivity (measured by the number of lines of code produced) increased by 55% for the group using the LLM. Approximately one third of this increase was directly attributable to code generated by the LLM. - *Link:* [CodeFuse: Generative AI for Code Productivity in the Workplace (BIS Working Paper 1208)](https://www.bis.org/publ/work1208.htm)

link

footy 268 days ago

> This took about 2 minutes of research to support the conclusions I know to be true

This is a terrible way to do research!

link

Our_Benefactors 268 days ago

The point is that the information is readily available, and rather than actually adding to the discussion they chose to crow “source?”. It’s very lame.

link

capyba 268 days ago

“ Productivity (measured by the number of lines of code produced) increased”

The LLM’s better have written more code, they’re a text generation machine!

In what world does this study prove that the LLM actually accomplished anything useful?

link

Our_Benefactors 268 days ago

As expected, the goalposts are being moved.

LOC does have a correlation with productivity, as much as devs hate to acknowledge it. I don’t care that you can provide counterexamples to this, or even if the AI on average takes more LOC to accomplish the same task - it still results in more productivity overall because it arrives at the result faster.

link

capyba 268 days ago

Nothing about this is moving goalposts - you and/or the person(s) conducting this study are the ones being misleading!

If you want to measure time to complete a complex task, then measure that. LOC is an intermediate measure. How much more productive is "55% more lines of code"?

I can write a bunch of garbage code really fast with a lot of bugs that doesn't work, or I can write a better program that works properly, slower. Under your framework, the former must be classified as 'better' - but why?

I read the study you reference and there is literally nothing in the study that talks about whether or not tasks were accomplished successfully.

It says: * Junior devs benefited more than senior devs, then presents a disingenuous argument as to why that's the senior devs' fault (more experienced employees are worse than less experienced employees, who knew?!) * 11% of the 55% increase in LOC was attributed directly to LLM output * Makes absolutely no attempt to measure whether or not the extra code was beneficial

link

Our_Benefactors 268 days ago

Yes, like I said, it’s not hard to provide counterexamples to why more LOC is better, but it’s also missing the forest for the trees to pretend it doesn’t matter at all.

link

psunavy03 268 days ago

If you are seriously linking "productivity" to "lines of code produced," that says all about your credibility that I need to know.

link

Our_Benefactors 268 days ago

Do you think LOC and program complexity are not correlated? You are arguing in bad faith.

link

psunavy03 268 days ago

Neither has anything to do with the effectiveness of a piece of software or the productivity of the people who created it.

link

snickerbockers 268 days ago

"the data"

link