Hacker News new | ask | show | jobs
by gertlabs 18 days ago
GPT 5.5 does significantly outperform Opus 4.7 in the coding parts of our evals.

We also incorporate live decision making on social games (where GPT 5.5 has actually regressed from earlier models, which shouldn't be a huge surprise if you ever tried talking it out of some of its nits).

We are still looking for a way to integrate "logical" intelligence with social intelligence in a less arbitrary way, so I'd take a look at the use case that applies to you (probably coding): https://gertlabs.com/rankings?mode=agentic_coding