| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by krajzeg 1117 days ago

Do we have a good, objective benchmark set of prompts in existence somewhere? If not, I think having one would really help with tracking changes like that.

I'm always skeptical of subjective feelings of tough-to-quantify things getting worse or better, especially where there is as much hype as for the various AI models.

One explanation for the feelings is the model really getting significantly worse over time. Another is the hype wearing off as you get more used to the shiny new thing and become more critical of its shortcomings.

3 comments

jonnycomputer 1117 days ago

https://github.com/FranxYao/chain-of-thought-hub

link

jonnycomputer 1117 days ago

https://pub.towardsai.net/meet-vicuna-the-latest-metas-llama...

link

jonnycomputer 1117 days ago

https://chat.lmsys.org/?arena

link