|
|
|
|
|
by jstummbillig
356 days ago
|
|
I think more to the point: The authors of this research don't really understand what they did. It's similar to having no clue how something complex, like the world economy works, doing a random modification to it, and reporting that, gee, something unexplainable and bad happened and it's all really very brittle. This is simply a property of complex systems in the real world. Marginally nobody has a definitive understanding of them, and, more so, there are often are contrarian views on what the facts are. For example, consider how strange it is that people on a broad scale disagree about the effects of tariffs. The ethics that govern the pros and cons, sure. But the effects? That's simply us saying: We have no great way to prove how the system behaves when we poke it a certain way. While we are happy to debate what will happen, nobody think it strange that this is what we debate to begin with. But with LLMs it's a big deal. Of course all these things are theoretically explainable. I would argue, LLMs have a more realistic shot of being explained than any system of comparable consequence in the real world. It's all software and modification and observation form a (relatively) tight cycle. Things can be tested without people suffering. That's pretty cool. |
|
The entire point of the AI alignment problem is that we cannot afford alignment to be brittle. Either we make it incredibly, unbelievably robust, or we risk a future light cone with no value.