Hacker News new | ask | show | jobs
by Yokohiii 3 days ago
Of course they can, but if you give a LLM a specific rule, what happens is that it only shifts the probability of following the rule. Prohibiting a rule violation is technically impossible via prompting.

Humans do make mistakes or forget things as well. We learn to not rush on stairs and to not touch hotplates. A few bruises later that wisdom is permanent, at some point we don't even need to fail with everything to accept the rules.

A LLM is permanently at risk to break every given rule.