Hacker News new | ask | show | jobs
by ssrlcc 885 days ago
AGI safety from first principles [1] is a good write-up.

You can read more about instrumental convergence, reward misspecification, goal mis-generalization and inner misalignment, which are some specific problems AI Safety people care about, by glossing through the curricula of the AI Alignment Course [2], which provides pointers to several relevant blogposts and papers about these topics.

[1] https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ [2] https://course.aisafetyfundamentals.com/alignment

1 comments

Is there a clear argument that I can read without spending more than 15 minutes of my time reading the argument? If such an argument exists somewhere, can you point to it?

Also note we were talking about modern day LLM AIs here, and their descendants. We were not talking about science fiction AGIs. Unless of course you have an argument as to how one of these LLMs somehow descends into an AGI.