| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Someone 6 days ago

FTA: “The final authority must sit behind a deterministic, non-bypassable gate. AI must never hold direct permissions for destructive, irreversible actions (deleting a production database, moving funds, pushing to prod). So the last line of defense must always be either human oversight or a deterministic script with no AI workarounds.”

That’s fine in theory, but won’t fly in practice for all destructive, irreversible actions. As an example, how do you prevent a chatbot from generating a highly insulting/racist remark or incorrect or illegal advice that will, later cost you millions?

Human oversight is (deemed) too expensive.

A deterministic script can detect known profanities, but may suffer from a variant of the Scunthorpe problem (https://en.wikipedia.org/wiki/Scunthorpe_problem), and won’t detect unknown profanities or creative ones that don’t use any words that are considered profane. A deterministic script also is very bad at detecting legal issues with responses.

“Don’t reply a chatbot” will work for that, but for many, that doesn’t seem to be an option.

1 comments

taleodor 6 days ago

It's not about that we should drop LLM completely from the mix, but something like AI -> LLM control -> old-school classifier control -> script / human oversight is the way. If something has potential to cause millions in damages, it should be subjected to human oversight (likelihood / impact analysis needs to happen early in the system design).

link