Y
Hacker News
new
|
ask
|
show
|
jobs
by
hitarpetar
212 days ago
do you find a 40-60% failure rate fits your definition of correctness? I don't think they really needed to spell this failure out...
https://www.salesforce.com/blog/why-generic-llm-agents-fall-...