|
I did a quick eval comparing Grok 4.3, Opus 4.7 and GPT 4.1 and they actually seem pretty similar: https://ofw640g9re.evvl.io/ They all did pretty well at a more "formal" tone, but GPT4.1 was the only one that didn't make me cringe with a "casual" tone. [edit] fwiw, grok was also the fastest+cheapest model, claude was slowest and priciest. |
What makes one cringe and another recognize as familiar and comfortable is also pretty subtle and hard to define. These things need nuanced descriptions and examples to actually get right, and it's in understanding those nuances and figuring out the register of the examples that Grok outshines the others.