Y
Hacker News
new
|
ask
|
show
|
jobs
by
wongarsu
4 hours ago
Tbf, most of the "real benchmarks" have issues that are just as bad. Assessing LLM performance is just hard