| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iLoveOncall 182 days ago
	> current models have almost 100% success rate on tasks taking humans less than 4 minutes The contrary is easily verifiable by everyone individually. It's nowhere near 100%, or even 50% for few minutes tasks even with the best models in real world situations.

1 comments

ben_w 182 days ago

I've only noticed that combination (failure of short everyday tasks from SOTA models) on image comprehension, not text.

So some model will misclassify my American black nightshade* weeds as a tomato, but I get consistently OK results for text out from good models unless it's a trick question.

* I recon, at least; looked like this to me: https://en.wikipedia.org/wiki/Solanum_americanum#/media/File...

link

iLoveOncall 182 days ago

The research from Metr, and my comment, is exclusively related to software development tasks.

link

ben_w 182 days ago

Re-reading my comment, I realise I missed the most important part, the question.

What examples can you give of "real world situations" where they fail?

Obviously I don't want to use them for whatever that is.

link