Hacker News new | ask | show | jobs
Can we bootstrap AI Safety despite being unable to even define it? (arxiv.org)
2 points by cryptohell 219 days ago
2 comments

AI output is modeled on human behavior. Are humans safe?
Given several models, assuming only that some unknown subset is "safe", can we construct a single model as safe as that subset? This reduces obtaining a trustworthy model to a plausibly easier task.