|
|
|
|
|
by NitpickLawyer
383 days ago
|
|
It depends. Frontier coding LLMs have been trained to perform well in an "agentic" loop, where they try things, look at the logs, find alternatives when the first thing didn't work, and so on. There's still debate on how much actual learning is in ICL (in context learning), but the effects are clear for anyone that has tried them. It sometimes works surprisingly well. I can totally see a way for such a loop to reach a point where it bypasses a poorly design guardrail (i.e. blacklists) by finding alternatives, based on the things it's previously tried in the same session. There is some degree of generalisation in these models, since they work even on unseen codebases, and with "new" tools (i.e. you can write your own MCP on top of existing internal APIs and the "agents" will be able to use them, see the results and adapt "in context" based on the results). |
|
"Claude has learned" nothing. "Claude can sometimes jailbreak if x or y happens in a session" is something else.