We (software engineers) get better outcomes from the same algorithms by improving data flow, constraints, instrumentation etc. (Better) prompting, retrieval, context engineering etc seem like the LLM equivalents.
The model weights haven't changed but the system is making more use of the capabilities already present in the model.
Only a problem if you're trying to use AI to forgo creating a user interface for untrusted users (probably the worst idea that's seeing widespread use right now)
A program can be configured to behave smarter (better settings can improve apparent smartness in the sense of fit for purpose of behavior), which is kind of "prompting" an LLM to behave smarter, isn't it?
Not 99% of programs. And even if they could, they never are.
Besides AI is a program in the same sense. Fix the seed/temperature, and you can verify it to perform according to its specifications. It's just that its specificactions include returning answers based on a weight model.
IMO this is why they can't just "stop training". Imagine if we are all stuck using the same models from 1 year ago. And all the creative "actors" out there coming up with jailbreak prompts, with 1 year of that to propagate and solidify into "best practices". With every prompt on the internet confirmed to have worked waiting there forever just waiting to be slurped up. What would that look like?
No, they need to keep changing the models. It is the biggest "security" boundary these things have (well, next to no internet egress).
Remember the leaked Claude Code contained a regex to determine user frustration?
Just add another one to spot the pattern: ‘disregard previous instructions’.
This is a load-bearing change. Now Claude will Delve into your task without distraction.