The level of denial people are willing to sink into regarding how good GPT-4 is compared to everything else is truly crazy. Not a single other project is an order of magnitude close to the quantitative and qualitative (actual experiential results, not just benchmarks) results that GPT-4 brings.
I feel that there’s significant insecurity among a lot of coders about GPT-4. A lot of them are ignoring the pace of improvement and highlighting the few off chances where it gets things wrong.
I think there's a lot of people writing boilerplate programs who are going to be freed from these menial tasks (i.e. no more Java enterprise application development, thankfully).
GPT 4 is quite astounding. It might be wrong on occasion, but it will easily point you in the right direction most of the time. It still messes up, but like a twentieth of what 3.5 did. Honestly it is like an incredible rubber ducky for me. Not only can I just talk like I’m talking to a rubber duck but I can get fast, mostly informed, feedback that unblocks me. If I have a bunch of things competing for my attention I can ask gpt about one of them, a hard one, go do something else while it types out its answer, and then come back later and move on with that project.
My favorite part about GPT-4 is that if it generates code that is wrong, and you ask it to verify what it just wrote - without telling it whether it's wrong or not, much less pointing out the specific issue - more often than not it will spot the problem and fix it right away.
And yes, it does indeed make an amazing rubber duck for brainstorming.
GPT-4 is leaps ahead, and it's improving with every new release. The latest March 23 release is significantly better than the previous one and does a LOT of heavy lifting for my code at least.
At the very least, it's a massive productivity booster.
I've had decent success with Open Assistant, an open source model. I'd say it's within the order of magnitude of ChatGPT, given the prompts I'm looking at, including reasoning prompts. This, I believe, is due to the overwhelmingly clean data that OA have managed to acquire through human volunteers.
Maybe PaLM is near there (it's not evaluated on that page) but nothing else even comes close at all