|
|
|
|
|
by robots0only
257 days ago
|
|
In all of these posts there is someone claiming Claude is the best, then somebody else claiming they have tried a bunch of times and for them Gemini is the best while others find GPT-5 is supreme. Obviously, all of these are subjective narrow experiences. My conclusion is that all frontier models are both good and bad with no clear winner and making good evals is really hard. |
|
* Gemini has the highest ceiling out of all of the models, but has consistently struggled with token-level accuracy. In other words, it's conceptual thinking it well beyond other models, but it sometimes makes stupid errors when talking. This makes it hard to reliably use for tool calling or structured output. Gemini is also very hard to steer, so when it's wrong, it's really hard to correct.
* Claude is extremely consistent and reliable. It's very, very good at the details - but will start to forget things if things get too complex. The good news is Claude is very steerable and will remember those details if you remind it.
* GPT-5 seems to be completely random for me. It's so inconsistent that it's extremely hard to use.
I tend to use Claude because I'm the most familiar with it and I'm confident that I can get good results out of it.