| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tagyro 1109 days ago

In general, the quality of the replies seems to have been affected. More specifically:

- the context window seems to have been reduced: I only have access to the 4k model, but even when the prompts and replies are well under this limit, the model seems to loose context, as in, the replies have no connection to the prompt;

- hallucinations: while this has been a general issue with LLM's and even taking into account a reduction in the context window, GPT-4 seems to hallucinate a lot more now;

- not answering: prompts for which I have received answers before get a "my cut-off date ..." answer now.

I used GPT-4 mostly via the API and I saved the threads (history) of all prompts I've been sending, so it's fairly easy for me to "test" the before/after changes.

Some tests that are easy to run:

- the "apple" test: write 10 sentences that end with the word "apple" - I used to get 8/10 most of the time, sometimes even 10/10. Now it's more like 6-7/10.

- a friend suggested the "jwst" prompt: when was the jwst launched? Depending on the answer (sometimes it says it is scheduled to launch, sometimes it answers correctly), subsequently ask "what hour and minute"

- use a 1-2k prompt and analyse the answer - this shouldn't be an issue with a 4k limit, but sometimes it will hallucinate or respond with something totally unrelated

temperature: 0.2, top_p: 1, max_tokens: 4000, streaming: true

edit: I've noticed that speed has been improved, but that doesn't really help when the quality suffers; also, data point of 1, but when using it from Germany (Europe?), the quality suffers, when prompting via vpn from US, it seems to be slower but the quality is better (but again, take that with a big grain of salt)