|
|
|
|
|
by craigus
302 days ago
|
|
"New science" phooey. Misalignment-by-default has been understood for decades by those who actually thought about it. S. Omohundro, 2008:
"Abstract. One might imagine that AI systems with harmless goals will be harmless.
This paper instead shows that intelligent systems will need to be carefully designed
to prevent them from behaving in harmful ways. We identify a number of “drives”
that will appear in sufficiently advanced AI systems of any design. We call them
drives because they are tendencies which will be present unless explicitly counteracted." https://selfawaresystems.com/wp-content/uploads/2008/01/ai_d... E. Yudkowsky, 2009:
"Any Future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals, will contain almost nothing of worth." https://www.lesswrong.com/posts/GNnHHmm8EzePmKzPk/value-is-f... |
|
But semantics phooey. It's interesting to read these abstracts and compare the alignment concerns they had in 2008 to where we are now. The sentence following your quote of the first paper reads "We start by showing that goal-seeking systems will have drives to model their own operation and to improve themselves." This was a credible concern 17 years ago, and maybe it will be a primary concern in the future. But it doesn't really apply to LLMs in a very interesting way, which is that we somehow managed to get machines that exhibit intelligence without being particularly goal-oriented. I'm not sure many people anticipated this.