Hacker News new | ask | show | jobs
by vintermann 272 days ago
They test specific prompts with temperature 0. It is of course possible that all their tests prompts were lucky, but still then, shouldn't you see an immediate drop followed by a flat or increasing line?

Also, from what I understand from the article, it's not a difficult task but an easily machine checkable one, i.e. whether the output conforms to a specific format.

2 comments

If it was random luck, wouldn't you expect about half the answers to be better? Assuming the OP isn't lying I don't think there's much room for luck when you get all the questions wrong on a T/F test.
With T=0 on the same model you should get the same exact output text. If they are not getting it, other environmental factors invalidate the test result.