Hacker News new | ask | show | jobs
by sberens 1170 days ago
> other models are in the ballpark

How true is this? From playing around with Bard and Claude, GPT-4 seems to be significantly better, especially around code generation / understanding.

2 comments

People really don't seem to understand just how far ahead OAI is in its reasoning abilities (https://crfm.stanford.edu/helm/v0.2.2/?group=reasoning)

Maybe PaLM is near there (it's not evaluated on that page) but nothing else even comes close at all

The level of denial people are willing to sink into regarding how good GPT-4 is compared to everything else is truly crazy. Not a single other project is an order of magnitude close to the quantitative and qualitative (actual experiential results, not just benchmarks) results that GPT-4 brings.
I feel that there’s significant insecurity among a lot of coders about GPT-4. A lot of them are ignoring the pace of improvement and highlighting the few off chances where it gets things wrong.
I think there's a lot of people writing boilerplate programs who are going to be freed from these menial tasks (i.e. no more Java enterprise application development, thankfully).
I've not used GPT-4, so it could be different, but regular old GPT-3.5 gets a _lot_ of things wrong.
GPT 4 is quite astounding. It might be wrong on occasion, but it will easily point you in the right direction most of the time. It still messes up, but like a twentieth of what 3.5 did. Honestly it is like an incredible rubber ducky for me. Not only can I just talk like I’m talking to a rubber duck but I can get fast, mostly informed, feedback that unblocks me. If I have a bunch of things competing for my attention I can ask gpt about one of them, a hard one, go do something else while it types out its answer, and then come back later and move on with that project.
Is a 95% reduction in errors an exaggeration or is it really that much better? Might just need to drop the $20/mo if it's really improved that much.
My favorite part about GPT-4 is that if it generates code that is wrong, and you ask it to verify what it just wrote - without telling it whether it's wrong or not, much less pointing out the specific issue - more often than not it will spot the problem and fix it right away.

And yes, it does indeed make an amazing rubber duck for brainstorming.

GPT-4 is leaps ahead, and it's improving with every new release. The latest March 23 release is significantly better than the previous one and does a LOT of heavy lifting for my code at least.

At the very least, it's a massive productivity booster.

I've had decent success with Open Assistant, an open source model. I'd say it's within the order of magnitude of ChatGPT, given the prompts I'm looking at, including reasoning prompts. This, I believe, is due to the overwhelmingly clean data that OA have managed to acquire through human volunteers.
> How true is this? From playing around with Bard and Claude, GPT-4 seems to be significantly better

I have at most moderate confidence in this hypothesis.