| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bongodongobob 754 days ago
	My point is that you shouldn't expect to one shot everything. Have it start by writing a spec, then outline classes and methods, then write the code, and feed it debug stuff.

2 comments

TechDebtDevin 754 days ago

I see your point but hand holding isn't really a good way to benchmark a models coding capabilities.

link

Closi 754 days ago

Depends if benchmarking is the aim, rather than decreasing the time it takes to build things.

link

TechDebtDevin 754 days ago

Well sure, but that wasn't what we were discussing. The original comment says they use that as their benchmark. While their coding task is a bit complex compared to other benchmarking prompts, it's not that crazy. Here is an example of prompts used for benchmarking with Python for reference:

https://huggingface.co/datasets/mbpp?row=98

At the end of the day LLMs in their current iteration aren't intended to do even moderately difficult tasks on their own but it's fun to query them to see progress when new claims are made.

link

Closi 754 days ago

The original comment says nothing about benchmarking, they just say that an AI can’t one shot their complex task?

link

amne 754 days ago

When I read

"My favorite thing to ask the models designed for programming is ....... None of them ever get it right"

I read "benchmark".

link

bottom999mottob 754 days ago

Exactly, expecting one shot 100% working code with one prompt is ridiculous at this point. It's why libraries like Aider are so useful, because you can iteratively diff generated code until it's useable.

link

TechDebtDevin 754 days ago

Sure it's impossible at this point, but the point of a benchmark isn't to complete the task it's to test it's efficacy overall and to see progress. None of them are 100% at even the simplistic python benchmarks, doesn't mean we shouldn't measure that capability. But sure, I get it. That's not how they are intended to be used but that's also not the point the commenter was laying out.

link