| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jasonjmcghee 41 days ago
	That's Grok 4.2 not 4.3 right? And why are you comparing to gpt-4.1? (As opposed to one of the 6? model releases since then - would have expected gpt 5.5)

1 comments

michaelbuckbee 41 days ago

Good catch, there was an issue with the second hardest thing in programming (caching).

Here's an updated eval with the proper models https://a3bmfqfom3.evvl.io/

link

Reebz 41 days ago

Claude 4.7 is the clear winner to me for manager and formal report updates.

As an ex-senior exec (hundreds of staff), the bolded timeline impact is a particular nuance that I would expect a Lead/Director to format for a VP+ audience. Interesting none of the other models did that. My eyes immediately went to impact statement, then worked back to context to grasp the whole situation.

link

wamatt 41 days ago

Thanks from where I'm looking Grok 4.3 and Claude 4.7 do a better job on the informal close friend/coworker vibe.

ChatGPT sounds fake / formal phrasing (for the specific close friend context) and has em-dashes and uses capitalization. Hence, ChatGPT does not, imo grok the assignment ;)

link

andai 41 days ago

Is it me or did GPT get noticeably more natural in word choice recently? You can see it between 4.1 and 5.5 here, but I'm not sure when that happened. (My guess would be one of the recent 5.x releases.)

Edit: I meant specifically the absence of bizarre phrasing. That seems to have improved.

link

reissbaker 41 days ago

Wow, I'm surprised. Grok 4.3 actually is noticeably better than the other two for the close-friend variant. Surprisingly I found Claude the cringiest of the three!

link