Hacker News new | ask | show | jobs
by PaulDavisThe1st 484 days ago
> could circumvent the rules over failing the task.

or the whole thing is just a reflection of the rules being incorrectly specified. As others have noted, minor variations in how rules are described can lead to wildly different possible outcomes. We might want to label an LLM's behavior as "circumventing", but that may be because our understanding of what the rules allow and disallow is incorrect (at least compared to the LLM's "understanding").