Hacker News new | ask | show | jobs
by tornato7 953 days ago
I like the test but do you take multiple samples / runs of a result? IMO for a proper benchmark you should ask it the same question 10+ times and show a confidence interval, otherwise you don't know if it's just a fluke or a lucky guess.
1 comments

Ahh good suggestion, I should clarify this in the article. I tried to compensate with volume -- I used a set of 200 questions for the testing. I was using temperature 0, so I'd get the same answer if I ran a single question multiple times.