Y
Hacker News
new
|
ask
|
show
|
jobs
by
operatingthetan
63 days ago
Probably a more interesting benchmark is one that is scored based on the LLM finding exploits in the benchmark.