| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by distalx 600 days ago
	On their "Developing a computer use model" post they have mention > On one evaluation created to test developers’ attempts to have models use computers, OSWorld, Claude currently gets 14.9%. That’s nowhere near human-level skill (which is generally 70-75%), but it’s far higher than the 7.7% obtained by the next-best AI model in the same category. Here, "next-best AI model in the same category" referes to which model.