|
|
|
|
|
by comp_throw7
390 days ago
|
|
I think this post is sort of confused because, centrally, the reason "AI Alignment" is a thing people talk about is because the problem, as originally envisioned, was to figure out how to avoid having superintelligent AI kill everyone. For a variety of reasons the term no longer refers primarily to that core problem, so the reason so many things that look like engineering problems have that label is mostly a historical artifact. |
|
Super-intelligent AI killing everyone, or even super-dumb AI killing everyone, is a result of the alignment problem when given enough scale. You don't jump to the conclusion of AI killing everyone and post hoc explain through reward hacking, you recognize reward hacking and extrapolate. This is also the reason why it is so important to look at it from engineering problems and from things happening on the smaller scales, *because ignoring all those problems is exactly how you create the scenario of AI killing everyone...*
[0] https://en.wikipedia.org/wiki/Goodhart%27s_law
[Side note] Even look at Asimov and his robot stories. The majority of them are about alignment. His 3 laws were written as things that sound good and have intent that would be clear to any reader, and then he pulls the rug out on you showing how they're naively defined and it isn't so obvious. Kinda like a programmer teaching their kids to make and PB&J Sandwich... https://www.youtube.com/watch?v=FN2RM-CHkuI