|
|
|
|
|
by NumberWangMan
1158 days ago
|
|
Sorry to get heavy here: truth is not an NLP problem, it's an alignment problem. We want truth, but we don't have a reliable way to train an AI to provide the truth, only to provide things that are either true, or sound true enough that they fool the reward function. And even then, that may not be exactly what the AI learns to do, because of there's another level of alignment problem, the "inner alignment" or "mesa-optimizer alignment" problem! With an AI like GPT, it is quirky and amusing. Once AIs get really powerful, it becomes scary, and a lot of people who understand this field much better than I do are worried it has a good chance of being deadly. Like, potentially kill-everyone-on-earth deadly. |
|
Personally I didn't need to imagine a specific scenario to understand that there's risk, but I think it would help me convince other folks if I did.