Y
Hacker News
new
|
ask
|
show
|
jobs
by
syntex
54 days ago
These benchmarks means very little. The real test is model + harness so agentic system that can fulfill given goals.