|
|
|
|
|
by Our_Benefactors
268 days ago
|
|
Go ahead and move the goalposts now... This took about 2 minutes of research to support the conclusions I know to be true. You can waste time as long as you choose in academia attempting to prove any point, while normal people make real contributions using LLMs. ### An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation
We evaluate TESTPILOT using OpenAI’s gpt3.5-turbo LLM on 25 npm packages with a total of 1,684 API functions. The generated
tests achieve a median statement coverage of 70.2% and branch coverage of 52.8%. In contrast, the state-of-the feedback-directed
JavaScript test generation technique, Nessie, achieves only 51.3% statement coverage and 25.6% branch coverage.
- *Link:* [An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation (arXiv)](https://arxiv.org/abs/2302.06527) --- ### Field Experiment – CodeFuse (12-week deployment)
- Productivity (measured by the number of lines of code produced) increased by 55% for the group using the LLM. Approximately one third of this increase was directly attributable to code generated by the LLM.
- *Link:* [CodeFuse: Generative AI for Code Productivity in the Workplace (BIS Working Paper 1208)](https://www.bis.org/publ/work1208.htm) |
|
This is a terrible way to do research!