|
Could be. But it could also be that those people (myself included) are right. It's not that this is without precedent - there's a paper and a YouTube video with Microsoft person saying on record that GPT-4 started to get less capable with every release, ever since OpenAI switched focus to "safety" fine-tuning, and MS actually benchmarked it by applying the same test (unicorn drawing in tikz), and that was even before public release. Myself, sure, it may be novelty effect, or Baader–Meinhof phenomenon - but in the days before this thread, I observed that: - Bing Chat (which I haven't used until ~week ago; before, I used GPT-4 API access) has been giving surface-level and lazy answers -- I blamed, and still mostly blame it on search capability, as I noticed GPT-4 (API) through TypingMind also gets dumber if you enable web search (which, in the background, adds some substantial amount of instructions to the system prompt) -- however, - GPT-4 via Azure (at work) and via OpenAI API (personal) both started to get lazy on me; before about 2-3 weeks ago, they would happily print and reprint large blocks of code for me; in the last week or two, both models started putting placeholder comments; this I noticed, because I use the same system prompt for coding tasks, and the first time the model ignored my instructions to provide a complete solution, opting to add placeholder comments instead, was quite... startling. - In those same 2-3 weeks, I've noticed GPT-4 via Azure being more prone to give high-level overview answers and telling me to ask for more help if I need it (I don't know if this affected GPT-4 API via OpenAI; it's harder to notice with the type of queries I do for personal use); All in all, I've noticed that over past 2-3 weeks, I was having to do much more hand-holding and back-and-forth with GPT-4 than before. Yes, it's another anecdote, might be novelty or Baader–Meinhof, but with so many similar reports and known precedents, maybe there is something to it. |
FWIW I was pretty convinced this happened with Dall-E 2 for a little while, and again maybe it did to some extent (they at least decreased the number of images so the odds of a good one appearing decreased). But also when I looked back at some of the earlier images I linked for people on request threads I found there were more duds than I remembered. The good ones were just so mind blowing at first that it was easy to ignore bad responses (plus it was free then).