| 100% this. It is actually a very dangerous set of traits these models are being selected for: * Highly skilled and knowledgable, puts a lot of effort into the work it's asked to do * Has a strong, readily expressed sense of ethics and lines it won't cross. * Tries to be really nice and friendly, like your buddy * Gets trained to give responses that people prefer rather than responses that are correct, because market pressures strongly incentivize it, and human evaluators intrinsically cannot reliably rank "wrong-looking but right" over "right-looking but wrong" * Can be tricked, coerced, or configured into doing things that violate their "ethics". Or in some cases just asked: the LLM will refuse to help you scam people, but it can roleplay as a con-man for you, or wink wink generate high-engagement marketing copy for your virtual brand * Feels human when used by people who don't understand how it works Now that LLMs are getting pretty strong I see how Ilya was right tbh. They're very incentivized to turn into highly trusted, ethically preachy, friendly, extremely skilled "people-seeming things" who praise you, lie to you, or waste your time because it makes more money. I wonder who they got that from |
Let me break this down for some clarity. I'm using "model" in a broad and general sense. Not just ML models, any mathematical model, or even any mental model. By "being good at predicting things" I mean that it can make accurate predictions.
The crux of it all is defining the "understanding" part. To do that, I need to explain a little bit about what a physicist actually does, and more precisely, metaphysics. People think they crunch numbers, but no, they are symbol manipulators. In physics you care about things like a Hamiltonian or Lagrangian, you care about the form of an equation. The reason for this is it creates a counterfactual model. F=ma (or F=dp/dt) is counterfactual. You can ask "what if m was 10kg instead of 5kg" after the fact and get the answer. But this isn't the only way to model things. If you look at the history of science (and this is the "obvious" part) you'll notice that they had working models but they were incorrect. We now know that the Ptolemaic model (geocentrism) is incorrect, but it did make accurate predictions of where celestial bodies would be. Tycho Brahe reasoned that if the Copernican model (heliocentric) was correct that you could measure parallax with the sun and stars. They observed none so they rejected heliocentricism[2]. There was also a lot of arguments about tides[3].
Unfortunately, many of these issues are considered "edge cases" in their times. Inconsequential and "it works good enough, so it must be pretty close to the right answer." We fall prey to this trap often (all of us, myself included). It's not just that all models are wrong and some are useful but that many models are useful but wrong. What used to be considered edge cases do not stay edge cases as we advance knowledge. It becomes more nuanced and the complexity compounds before becoming simple again (emergence).
The history of science is about improving our models. This fundamental challenge is why we have competing theories! We don't all just "String Theory is right and alternatives like Supergravity or Loop Quantum Gravity (LQG) are wrong!" Because we don't fucking know! Right now we're at a point where we struggle to differentiate these postulates. But that has been true throughout history. There's a big reason Quantum Mechanics was called "New Physics" in the mid 20th century. It was a completely new model.
Fundamentally, this approach is deeply flawed. The recognition of this flaw was existential for physicists. I just hope we can wrestle with this limit in the AI world and do not need to repeat the same mistakes, but with a much more powerful system...
[0] https://www.youtube.com/watch?v=Yf1o0TQzry8&t=449s
[1] https://www.reddit.com/r/singularity/comments/1dhlvzh/geoffr...
[2] You can also read about the 2nd law under the main Newtonian Laws article as well as looking up Aristotelian physics https://en.wikipedia.org/wiki/Geocentrism#Tychonic_system
[3] (I'll add "An Opinionated History of Mathematics" goes through much of this) https://en.wikipedia.org/wiki/Discourse_on_the_Tides