My gut metric says it's a ~20% increase in perceived interpretation and output complexity, whatever that means exactly. But there are plenty of eval result aggregators out there.
To me GPT-4 seems actually intelligent and reasoning capable while GPT-3.5 does not. Many of my usecases involve giving large bodies of text to GPT and asking to reason about this. 3.5 has no clue, but 4 seems to handle it intelligently.
Overall it is as if GPT3.5 feels just like a clueless summarizer, but GPT4 intelligent interpreter and reasoner that I can trust.
Depending on which way you look at it, it could be 10x or 1000x the intelligence.
Overall it is as if GPT3.5 feels just like a clueless summarizer, but GPT4 intelligent interpreter and reasoner that I can trust.
Depending on which way you look at it, it could be 10x or 1000x the intelligence.