Hacker News new | ask | show | jobs
by ssrlcc 884 days ago
AGI safety from first principles [1] is a good write-up.

You can read more about instrumental convergence, reward misspecification, goal mis-generalization and inner misalignment, which are some specific problems AI Safety people care about, by glossing through the curricula of the AI Alignment Course [2], which provides pointers to several relevant blogposts and papers about these topics.

[1] https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ

[2] https://course.aisafetyfundamentals.com/alignment