| HN Mirror

In intelligence/performance. It's admittedly a fuzzy notion. Most benchmarks will probably show decreasing gains between generations. Similar to time/space complexity, trying to debate about what performance/intelligence is will get into a million definitions, caveats and technicalities. But a relative comparison between inputs and outputs is gives us useful information.

The inputs - data, compute and parameters - going into training these models have grown by many orders of magnitude between each gen. There's a lot of fuzziness about how much better each gen has gotten, but clearly 4 is not many orders of magnitude better than 3 by any reasonable definition. This mental model isn't useful to say how good each gen is, but it is quite useful to see the trend and make long term predictions.