|
|
|
|
|
by 0xDEAFBEAD
946 days ago
|
|
I see how some of his tweets could come across as crank-ish if you don't have a background in AI alignment. AI alignment is sort of like computer security in the sense that you're trying to guard against the unknown. If there was a way to push a button which told you the biggest security flaw in the software you're writing, then the task of writing secure software would be far easier. But instead we have to assume the existence of bugs, and apply principles like defense-in-depth and least privilege to mitigate whatever exploits may exist. In the same way, much of AI alignment consists of thinking about hypothetical failure modes of advanced AI systems and how to mitigate them. I think this specific paper is especially useful for understanding the technical background that motivates Eliezer's tweeting: https://arxiv.org/pdf/1906.01820.pdf |
|