| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by growdark 236 days ago
	I'd love to see a benchmark that tests different LLMs for slop, not necessarily limited to code. That might be even more interesting than ARC-AGI.

3 comments

Note this is the same first author

Not a benchmark per se, but there is a "Not x, but y" Slop Leaderboard:

100% of LLM output is slop. Done.