Hacker News new | ask | show | jobs
by csense 1120 days ago
I went in expecting progress on "alignment" as in "how to make sure AI doesn't kill us all" and I saw nothing at all about that in the paper. Disappointing.

Using the term "alignment" for what they're trying to do is misleading.

2 comments

Your understanding of alignment is somewhat out of date. Training a model to produce human-valued responses and training a model not to decide to destroy all the humans are not separate problems. RLHF may actually be an excellent solution to many of the problems you care about for today's LLMs, even though it is done for a practical reason (we want LLMs that will answer our questions with useful answers) instead of an existential risk reason.
It's not misleading. It's the way the term is used in the field. The usage of the term as you are thinking of it is just another usage.