Hacker News new | ask | show | jobs
by HarHarVeryFunny 810 days ago
Yes, and one could imagine, at least for a while, an escalating arms race better detectors and people developing prompts capable of inducing responses that circumvented the detector. Still, it seems the only safety approach with any hope of working if you are shipping systems that can learn at runtime, and of course future systems will become much better at this (full online learning, not just in-context).