Hacker News new | ask | show | jobs
by sk11001 728 days ago
Both GPT-4 and 4o have been completely useless for coding in the past couple of weeks for me - constant errors, and not just your typical LLM inaccuracies but incapable of producing a few lines of self-consistent code e.g. defines variables foo on one line and refers to it as bar on the next, or it misspells it as foox.
5 comments

Waht language? Because I'm guessing they work well for languages with a large amount of training data like Python (in my experience), less well for less used languages like Zig or Clojure (haven't tried them but that's my theory)
From my experience, GPT-4 works well with both Clojure and Zig. A lot of it depends on the way you prompt though. For example, asking to start with a C or C++ example and converting to Zig often works better than starting straight with Zig. The same strategy works with Java and Clojure too.
I use it for Rust and it's.... meh. It gets things wrong enough that I don't reach for it except to help me reference certain docs. It tends to hallucinate APIs and semantics that just don't exist. Honestly couldn't imagine using it with a dynamic language.
Python here. And like they said, only noticable in the last few weeks.
I've been seeing this too. Always hard to tell what's a real change vs the rolls of the dice lately but I've been having weird python inconsistencies too, in very short snippets doing pretty simple things.
For me it has been very repetitious despite my instruction to the contrary.
I've been experiencing bizarre typos and misspellings that I've come to describe as the model being drunk. Things like it writing peremeter instead of parameter
Yeah, misspellings were something so rare that I thought an LLM was incapable of producing them.

Yet over the past few weeks GPT-4 and 4o make them all the time. It will randomly change my postgres schema from public to publish. And, well, just this one for yourself:

> *Using the 'kubectl cp Command*: Execute the 'czygk cp' command to copy the file from your local machine to the pod.

Today, I asked 4o how to get around conditionally executing React hooks (illegal in React) and it rewrote my code to simply do it again but it merely swapped the order of a ternary, performance possibly worse than gpt3.

Maybe they’re weakening it because they expanded their free tier, but it has become surprisingly bad.

The level of misspelling is insane at the moment. It does it almost 50%+ of the times. I just started using claude 3.5 and the difference is night and day.
It's the same model though. Maybe your perception has changed.
I have first noticed logprob fluctuations in GPT-4o. Perhaps the same phenomenon is also going on with Turbo. I din‘t recall specifics but it was naming inconsistencies with variable names, meaning: same variable name got a typo somewhere, but the typo was close enough - perhaps a space vs. an underscore or something like that.

Model could be the same, but maybe some in the infra is different.

I can’t speak for what OpenAI is doing, but I’ve noticed those types of hallucinations occurring when I quantize a model beyond a certain point.

Maybe they are trying to cut down on memory usage ?

Is it the same? On the Models page of the API docs it says that GPT-4 is using the June 13th which would be different than the March 23rd.