Hacker News new | ask | show | jobs
by KevinBenSmith 1135 days ago
I had similar thoughts about the general concept of using AI to automate AI Safety.

I really like their approach and I think it’s valuable. And in this particular case, they do have a way to score the explainer model. And I think it could be very valuable for various AI Safety issues.

However, I don’t yet see how it can help with the potentially biggest danger where a super intelligent AGI is created that is not aligned with humans. The newly created AGI might be 10x more intelligent than the explainer model. To such an extent that the explainer model is not capable of understanding any tactics deployed by the super intelligent AGI. The same way ants are most probably not capable of explaining the tactics delloyed by humans, even if we gave them a 100 years to figure it out.

1 comments

Safest thing to do, stop inverting and building more powerful and potentially dangerous systems which we can’t understand?