Hacker News new | ask | show | jobs
by Workaccount2 513 days ago
There is a frustrating gap between benchmarks and real world ability.

O1 or even O3 might be able to crack academic level math problems, but I still wouldn't trust it to correctly fill out a McDonalds application using a PDF of my resume and a calendar of my availability.

1 comments

A lot of that has to do with certainty. The GPTs and Claudes will be replacing graudate-level research assistant jobs and other jobs that are very high skill but have soft success criteria long before they replace travel agents, which have low skill but very hard criteria for success.