Y
Hacker News
new
|
ask
|
show
|
jobs
by
growdark
236 days ago
I'd love to see a benchmark that tests different LLMs for slop, not necessarily limited to code. That might be even more interesting than ARC-AGI.
3 comments
Bolwin
236 days ago
See the writing benchmarks here
https://eqbench.com/creative_writing_longform.html
link
Der_Einzige
236 days ago
Note this is the same first author
link
jampa
236 days ago
Not a benchmark per se, but there is a "Not x, but y" Slop Leaderboard:
https://www.reddit.com/r/LocalLLaMA/comments/1lv2t7n/not_x_b...
link
topaz0
236 days ago
100% of LLM output is slop. Done.
link