Hacker News new | ask | show | jobs
by skyzouwdev 305 days ago
Really like the shift from synthetic benchmarks to actual reader engagement — feels way more aligned with what “good writing” actually means. Curious if you’ve noticed certain models consistently improving more with feedback than others.
1 comments

Thanks! Anecdotally, I'd tend to say that Claude 3.7 tends to improve the most, but it seems like (via the leaderboard), some people really prefer Grok-3 lol.