| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jasonjmcghee 482 days ago
	Just want to say nice job and keep it up. Thrilled to start playing with 3.7. In general, benchmarks seem to very misleading in my experience, and I still prefer sonnet 3.5 for _nearly_ every use case- except massive text tasks, which I use gemini 2.0 pro with the 2M token context window.

2 comments

jasonjmcghee 482 days ago

An update: "code" is very good. Just did a ~4 hour task in about an hour. It cost $3 which is more than I usual spend in an hour, but very worth it.

link

martinald 482 days ago

I find the webdev arena tends to match my experience with models much more closely than other benchmarks: https://web.lmarena.ai/leaderboard. Excited to see how 3.7 performs!

link