Hacker News new | ask | show | jobs
by krajzeg 1117 days ago
Do we have a good, objective benchmark set of prompts in existence somewhere? If not, I think having one would really help with tracking changes like that.

I'm always skeptical of subjective feelings of tough-to-quantify things getting worse or better, especially where there is as much hype as for the various AI models.

One explanation for the feelings is the model really getting significantly worse over time. Another is the hype wearing off as you get more used to the shiny new thing and become more critical of its shortcomings.

3 comments