Damn, show us your brilliant prompt then. LLMs cannot do this, not even in python, of which there are libraries like Blacksheep that honestly make it a trivial task.
My point is that you shouldn't expect to one shot everything. Have it start by writing a spec, then outline classes and methods, then write the code, and feed it debug stuff.
Well sure, but that wasn't what we were discussing. The original comment says they use that as their benchmark. While their coding task is a bit complex compared to other benchmarking prompts, it's not that crazy. Here is an example of prompts used for benchmarking with Python for reference:
At the end of the day LLMs in their current iteration aren't intended to do even moderately difficult tasks on their own but it's fun to query them to see progress when new claims are made.
Exactly, expecting one shot 100% working code with one prompt is ridiculous at this point. It's why libraries like Aider are so useful, because you can iteratively diff generated code until it's useable.
Sure it's impossible at this point, but the point of a benchmark isn't to complete the task it's to test it's efficacy overall and to see progress. None of them are 100% at even the simplistic python benchmarks, doesn't mean we shouldn't measure that capability. But sure, I get it. That's not how they are intended to be used but that's also not the point the commenter was laying out.
Prompts like yours (I ask them for a fluid dynamics simulator which also doesn't succeed) inform us of the level they have reached. A useful benchmark, given how many of the formal ones they breeze through.
I'm glad they can't quite manage this yet. Means I still have a job.
No he is right, he is saying taken to the extreme. The point is the more and more specific you have to prompt, the more you are actually contributing to the result yourself and the less the model is
Yes but the build up isn't manual. You go patching prompts with responses until the final result. The last prompt will be almost the whole code complete, obviously.
Well now we get into information density and Komolgorov complexity. The more complicated your desired output program is, the more information you'll have to put in, ie, more complicated prompts.