Hacker News new | ask | show | jobs
by datameta 974 days ago
My gut metric says it's a ~20% increase in perceived interpretation and output complexity, whatever that means exactly. But there are plenty of eval result aggregators out there.
2 comments

To me GPT-4 seems actually intelligent and reasoning capable while GPT-3.5 does not. Many of my usecases involve giving large bodies of text to GPT and asking to reason about this. 3.5 has no clue, but 4 seems to handle it intelligently.

Overall it is as if GPT3.5 feels just like a clueless summarizer, but GPT4 intelligent interpreter and reasoner that I can trust.

Depending on which way you look at it, it could be 10x or 1000x the intelligence.

I think trust is a key thing you've hightlighted. I find myself doubting GPT3.5, whereas not at all for GPT4.
Yeah, there are measurable results on things like AP bio. And those are definitely not 10x.