| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sk11001 728 days ago
	Both GPT-4 and 4o have been completely useless for coding in the past couple of weeks for me - constant errors, and not just your typical LLM inaccuracies but incapable of producing a few lines of self-consistent code e.g. defines variables foo on one line and refers to it as bar on the next, or it misspells it as foox.

5 comments

labrador 728 days ago

Waht language? Because I'm guessing they work well for languages with a large amount of training data like Python (in my experience), less well for less used languages like Zig or Clojure (haven't tried them but that's my theory)

link

rads 728 days ago

From my experience, GPT-4 works well with both Clojure and Zig. A lot of it depends on the way you prompt though. For example, asking to start with a C or C++ example and converting to Zig often works better than starting straight with Zig. The same strategy works with Java and Clojure too.

link

ModernMech 728 days ago

I use it for Rust and it's.... meh. It gets things wrong enough that I don't reach for it except to help me reference certain docs. It tends to hallucinate APIs and semantics that just don't exist. Honestly couldn't imagine using it with a dynamic language.

link

ndr_ 728 days ago

Python here. And like they said, only noticable in the last few weeks.

link

heyitsguay 727 days ago

I've been seeing this too. Always hard to tell what's a real change vs the rolls of the dice lately but I've been having weird python inconsistencies too, in very short snippets doing pretty simple things.

link

esafak 728 days ago

For me it has been very repetitious despite my instruction to the contrary.

link

Zetaphor 727 days ago

I've been experiencing bizarre typos and misspellings that I've come to describe as the model being drunk. Things like it writing peremeter instead of parameter

link

hombre_fatal 727 days ago

Yeah, misspellings were something so rare that I thought an LLM was incapable of producing them.

Yet over the past few weeks GPT-4 and 4o make them all the time. It will randomly change my postgres schema from public to publish. And, well, just this one for yourself:

> *Using the 'kubectl cp Command*: Execute the 'czygk cp' command to copy the file from your local machine to the pod.

Today, I asked 4o how to get around conditionally executing React hooks (illegal in React) and it rewrote my code to simply do it again but it merely swapped the order of a ternary, performance possibly worse than gpt3.

Maybe they’re weakening it because they expanded their free tier, but it has become surprisingly bad.

link

kake25 723 days ago

The level of misspelling is insane at the moment. It does it almost 50%+ of the times. I just started using claude 3.5 and the difference is night and day.

link

ipsum2 728 days ago

It's the same model though. Maybe your perception has changed.

link

ndr_ 728 days ago

I have first noticed logprob fluctuations in GPT-4o. Perhaps the same phenomenon is also going on with Turbo. I din‘t recall specifics but it was naming inconsistencies with variable names, meaning: same variable name got a typo somewhere, but the typo was close enough - perhaps a space vs. an underscore or something like that.

Model could be the same, but maybe some in the infra is different.

link

great_psy 727 days ago

I can’t speak for what OpenAI is doing, but I’ve noticed those types of hallucinations occurring when I quantize a model beyond a certain point.

Maybe they are trying to cut down on memory usage ?

link

edub 727 days ago

Is it the same? On the Models page of the API docs it says that GPT-4 is using the June 13th which would be different than the March 23rd.

link