| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vintagedave 413 days ago

Clickbait headline, and it's reporting something from Business Insider (itself IMO a terrible website these days), but:

> the results were dismal. The best-performing model was Anthropic's Claude 3.5 Sonnet, which struggled to finish just 24 percent of the jobs assigned to it. The study's authors note that even this meager performance is prohibitively expensive, averaging nearly 30 steps and a cost of over $6 per task.

and other AIs were worse.

1 comments

sokoloff 413 days ago

$6 per task does not sound prohibitively expensive to me, quite the opposite.

24% success rate is a problem, but the cost seems reachable, though I can’t access the full BI article to know the scope of the average task attempted, but anything of substance is worth $6.

link

beefnugs 412 days ago

That would be cost per task on top of all the other regular business humans you need (same current level experts fixing all their mistakes). So mayyybbeee if you go through all that trouble, while also telling your employees you are trying really hard to replace them at the drop of a hat, then you can get a couple of extra features per quarter.

link

sokoloff 412 days ago

Sure, while the AI is busily shitting out 3 mistakes for every success at $6 each (~$25 plus 3 errors to fix per success), you need the same [or even greater numbers of] humans to accomplish the overall job.

But if you can identify the slice of work that AI can do with 98% or 99% unattended success rate, then you can steer the humans you have to higher value work, having released them from 20+% of their tasks at the cost of only $6/task.

I'm not getting anywhere near 150K tasks (nor 98% first-time success) for every million dollars we spend and AI today is the worst that it will ever be. $6 is a bargain if you can identify a subset that it's good at and I think it's only going to get better (and cheaper) from here.

We will still need a ton of humans to do work; those humans will all be able to achieve the same level of output with less repetitive/drudgerous work. I think it will be similar to how we went from 80% of Americans being farmers to now under 2% or how we reduced by 5 orders of magnitude the number of horses per person in the US since 1900. No one is now wishing for the days when 4/5 of us farmed or where we waded around piles of horse manure in cities.

link