| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ssrlcc 885 days ago

AGI safety from first principles [1] is a good write-up.

You can read more about instrumental convergence, reward misspecification, goal mis-generalization and inner misalignment, which are some specific problems AI Safety people care about, by glossing through the curricula of the AI Alignment Course [2], which provides pointers to several relevant blogposts and papers about these topics.

[1] https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ [2] https://course.aisafetyfundamentals.com/alignment

1 comments

baobabKoodaa 885 days ago

Is there a clear argument that I can read without spending more than 15 minutes of my time reading the argument? If such an argument exists somewhere, can you point to it?

Also note we were talking about modern day LLM AIs here, and their descendants. We were not talking about science fiction AGIs. Unless of course you have an argument as to how one of these LLMs somehow descends into an AGI.

link