GPT-4 has been made worse in the ChatGPT UI as well since the May update. It makes many more strange errors and has trouble reasoning around complex problems and ambiguity. Prompts similar to stuff that worked fine for me last month now require multiple iterations of feedback. I’d switch to using the API, but I’d go over the equivalent $20 of usage pretty quickly.
The models tend to degrade when trained to be safer.
A GPT-4 talk on youtube by personnel from Microsoft has documented this phenomenon with the 'Tikz Unicorn' evolution shown in the GPT-4 technical paper.
The model gets qualitatively better with more training, and then degrades when trained to be safer (against racism sexism, etc), but it is not entirely clear why. These would seem very unrelated, especially when considering work done in LM editing (ROME/MEMIT) and the decent localization of knowledge seen there.
So, perhaps both the "I'm sorry I can't..." and 'strange errors' are not entirely orthogonal.
To me it is clear why. Imagine someone told you "answer immediately, top of your head: what's the best seasoning?". You'd just blurt out whatever specific you associated with pleasing seasoning (and that would be a good answer). Now imagine someone said "answer immediately, off the top of your head, but without offending any culture, gender, without a cultural bias, and without being presumptuous of the listener's socio-economic status (and if you fail one of these, someone dies) what's the best seasoning?"
Even without the way that is going to lead to all sorts of compromising and second guessing in the answer space, simply only a fraction now of your brain is left to associate about the question due to just holding all that other stuff in there.
Yes! I’ve noticed this too, it’s just slightly less sharp. It probably has to do with how much trouble they are having servicing all of the demand, so they have rolled out a scaled down version that requires less compute.
Nah. Most of the response are as an ai language model I can't, even if you ask for information you provided.
The API is where it's at. There are wrappers on it that create the same chat look and feel, that can run on vercel or other very low cost providers, some with simpler UI, some with more features,some replicating the UI exactly.
I was under the impression that it was mostly GPU vram based but once the model is loaded, it could produce output quickly? I'm probably over-simplifying things...
The latest gpt-3.5-turbo model generates very quickly and cheaply (in part to some recently-discoverd optimization techniques... older versions cost 10x more). While the required hardware to run GPT-4 is currently unknown, it generates considerably slower on average and its much higher cost points to a higher hardware cost.