Hacker News new | ask | show | jobs
by Imnimo 751 days ago
"Automated alignment research" suggests he's still interested in following the superalignment blueprint from OpenAI. So what do you do while you're waiting for the AI that's capable of doing alignment research for you to arrive? If you believe this is a viable path, what's the point of putzing around doing your own research when you'll allegedly have an army of AI researchers at your command in the near future?
4 comments

Well, I presume you have to figure out how to evaluate their output, especially for trustworthiness. And that's something you have to do the core of yourself, no matter how many AI researchers you'll have.
The premise of the plan is that evaluating output is easier than producing it, such that a human researcher could look at the AI researcher's output and tell if it's correct and trustworthy. If this is true, what else is there to figure out?
> what do you do while you're waiting for the AI that's capable of doing alignment research for you to arrive

Nobody interested in superalignment is interested in waiting until actually threatening AI gets here.

But that's the fundamental superalignment plan - train a human-level alignment researcher AI, run a bunch of them in parallel, and review their research output to see if they solve the alignment problem. You can't do the plan until the human-level alignment researcher AI already exists.
A large part of the idea is that you can develop techniques for aligning sub-human AI using even stupider AI and hope/pray that continues to generalize once you get to super-human AI being aligned by human-level AI.
Current systems are already (in a limited way) helping with alignment, anthropic is using its AI to label the sparse features of their sparse auto encoder approach. I think the original idea of labeling neurons by AI came from william saunders, who also left openai recently.
I think his tweet can be read as "research in (1) scalable oversight, (2) weak-to-strong generalization, and (3) automated alignment".