Hacker News new | ask | show | jobs
by NumberWangMan 1158 days ago
Sorry to get heavy here: truth is not an NLP problem, it's an alignment problem. We want truth, but we don't have a reliable way to train an AI to provide the truth, only to provide things that are either true, or sound true enough that they fool the reward function. And even then, that may not be exactly what the AI learns to do, because of there's another level of alignment problem, the "inner alignment" or "mesa-optimizer alignment" problem!

With an AI like GPT, it is quirky and amusing. Once AIs get really powerful, it becomes scary, and a lot of people who understand this field much better than I do are worried it has a good chance of being deadly. Like, potentially kill-everyone-on-earth deadly.

1 comments

Hard agree. I'm really trying to figure out how to inject this idea into my friends' heads effectively. The main struggle I'm facing is how to convey the danger behind it. Why can it be deadly exactly? What can a program actually do to harm people, to the level where it's a risk of extinction or societal collapse?

Personally I didn't need to imagine a specific scenario to understand that there's risk, but I think it would help me convince other folks if I did.

> risk of [...] or societal collapse

If you want society to collapse all you need to do is succeed in having AI automate all jobs.

Every single country where money comes from somewhere other than people (oil, diamonds...) is an authoritarian nightmare simply because keeping people happy is not necessary.

Once AI can do everything and robots that can do any physical labor are developed the population will shrink dramatically as people with killer robots kill each other for resources. There is no need for AI rebellion or AI failure to get there.