Whenever someone says this, I always wonder whether we're talking about the same tool. GPT-3.5 scores at the bottom 10% in a lot of tasks where GPT-4 scores in the top 10%.
Whenever someone says this, I wonder whether they're just ignoring all the errors (rose tinted hype glasses?) or whether they're just bad at their jobs.
I've tried using GPT-4 for programming tasks, only deferring to it when I was stumped on something. I felt like it was wasting my time more often than not.
Not the same task. Gpt-4 scored top 10% at simple tasks gpt 3 was already good at, and a little less worthless on some tasks that gpt-3 were really bad, it didn't completely revolutionize at any kind of task.