Hacker News new | ask | show | jobs
by yorwba 19 days ago
Yes, if your LLM sandbox had a huge hole in it guarded only by asking an LLM whether the stuff coming out is low-risk, you would indeed get sand into all kinds of inconvenient places.

So don't do that. If you want to sandbox an LLM, all output of any consequence needs to pass through a human brain qualified to evaluate whether those consequences are desirable or not. If you don't want to do that because reading LLM output is exhausting, you're free to discover the consequences in some other way, but that doesn't mean sandboxing isn't a solution. It just comes with the tradeoff that you can't outsource all decisions to LLMs.