Hacker News new | ask | show | jobs
by mjr00 870 days ago
on the other hand the rate of change isn't constant and there isn't a guarantee that the incredible progress in the past ~2 years in the LLM/diffusion/"AI" space will continue. As an example, take computer gaming graphics; compare the evolution between Wolfenstein 3D (1992) and Quake 3 Arena (1999), which is an absolute quantum leap. Now compare Resident Evil 7 (2017) and Alan Wake 2 (2023) and it's an improvement but nowhere near the same scale.

We've already seen a fair bit of stagnation in the past year as ChatGPT gets progressively worse as the company is more focusing on neutering results to limit its exposure to legal liability.

2 comments

Yes again, it's very strange to see a simple focus on one particular instance from one particular company to represent the entire idea of technology in general.

If windows 11 is far worse in many metrics than windows XP or Linux, does that mean that technology is useless?

It's one instance of something with a very particular vision being imposed. Windows 11 being slow due to reporting several GB of user data in the first few minutes of interaction with the system does not mean that all new OS are slow. Similarly, some older tech in a web UI (ChatGPT) for genAI producing non-physical data does not mean that all multimodal models will produce data unsupported by physics. Many works have already shown a good portion of the problems in GPTs can be fixed with different methods stemming from rome, rl-sr, sheavNNs, etc.

My point isn't even that certain capabilities may get better in the future, but rather that they already are better now, just not integrated into certain models.

>ChatGPT gets progressively worse

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar..., In blinded human comparisons, newer models perform better than older ones.

That website doesn't load for me but anyone who uses ChatGPT semi regularly can see that it's getting steadily worse if you ever ask for anything that begins to border risque. It has even refused to provide me with things like bolt torque specs because of risk.
It could be a bias, that's why we do blinded comparisons for a more accurate rating. If we have to consider my opinion, since I use it often, then no, it hasn't gotten worse over time.
Well I can't load that website so I can't assess their methodology. But I am telling you it is objectively worse for me now. Many others report the same.

Edit - the website finally loaded for me and while their methodology is listed, the actual prompts they use are not. The only example prompt is "correct grammar: I are happy". Which doesn't do anything at all to assess what we're talking about, which is ChatGPT's inability to deal with subjects which are "risky" (where "risky" is defined as "Americans think it's icky to talk about").

There is no selected prompt, humans ask the models (blindly) some questions in a chat and then select the best one for them.
Worse is really subjective. More limited functionality with a specific set of topics? Sure. More difficult to trick to get around said topic bans? Sure.

Worse overall? You can use chatgpt 4 and 3.5 side by side and see an obvious difference.

Your specific example seems fairly reasonable. Is there liability in saying x bolt can handle y torque if that ended up not being true? I don't know. What is that bolt causes an accident and someone dies? I'm sure a lawyer could argue that case if ChatGPT gave a bad answer.