| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by findjashua 110 days ago
	failed the car wash test. i think instead of postiioning as a general purpuse reasoning model, they'd have more success focusing on a specific use case (eg coding agent) and benchmark against the sota open models for the use case (eg qwen3-coder-next)

1 comments

Jianghong94 110 days ago

Honestly I don't understand why they/any fast-and-error-prone model position themselves as coding agents; my experience tells me that I'd much rather working with a slow-but-correct model and let it run longer session than handholding a fast-but-wrong model.

link