| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by skyzouwdev 305 days ago
	Really like the shift from synthetic benchmarks to actual reader engagement — feels way more aligned with what “good writing” actually means. Curious if you’ve noticed certain models consistently improving more with feedback than others.

1 comments

jauws 305 days ago

Thanks! Anecdotally, I'd tend to say that Claude 3.7 tends to improve the most, but it seems like (via the leaderboard), some people really prefer Grok-3 lol.

link