| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Kwantuum 1280 days ago
	A lot of the comments seem to talk about the inevitable AI event horizon but unless I'm misreading this article the results are flat out bad. Even the 6 billion parameters model barely scratches a 50% success rate on a tiny problem that is trivial to fix for any human with basic knowledge of programming. Note the log scale of the graph.

3 comments

hellodanylo 1279 days ago

Yeah, I am also struggling to interpret the metrics in this post positively.

The 50% success rate is also best out of 3200 completions. For best out of 1 completion, the success rate is in low single digits.

I think the lesson here is that these models bring a lot more value when: 1. you have unit tests, 2. can afford compute/time to let the model try many solutions, 3. have enough isolation to run unverified code.

link

zaidhaan 1279 days ago

They do note that the models "tend to do better when prompted with longer code generation tasks".

But yes, the choice of scales for the graph was rather peculiar.

link

kdnvk 1279 days ago

6 billion is by no means large.

link