|
|
|
|
|
by vintagedave
413 days ago
|
|
Clickbait headline, and it's reporting something from Business Insider (itself IMO a terrible website these days), but: > the results were dismal. The best-performing model was Anthropic's Claude 3.5 Sonnet, which struggled to finish just 24 percent of the jobs assigned to it. The study's authors note that even this meager performance is prohibitively expensive, averaging nearly 30 steps and a cost of over $6 per task. and other AIs were worse. |
|
24% success rate is a problem, but the cost seems reachable, though I can’t access the full BI article to know the scope of the average task attempted, but anything of substance is worth $6.