Hacker News new | ask | show | jobs
by dgb23 73 days ago
For those who are wondering: These LLMs are trained on special delimiters that mark different sources of messages. There's typical something like [system][/system], then one for agent, user and tool. There are also different delimiter shapes.

You can even construct a raw prompt and tell it your own messaging structure just via the prompt. During my initial tinkering with a local model I did it this way because I didn't know about the special delimiters. It actually kind of worked and I got it to call tools. Was just more unreliable. And it also did some weird stuff like repeating the problem statement that it should act on with a tool call and got in loops where it posed itself similar problems and then tried to fix them with tool calls. Very weird.

In any case, I think the lesson here is that it's all just probabilistic. When it works and the agent does something useful or even clever, then it feels a bit like magic. But that's misleading and dangerous.

1 comments

i think that a wasteful but good solution would be to tag each token, not use opening/closing tags.

whatever n-dimensional space the tokens occupy, manually add more dimensions, to reflect user/agent, trusted/untrusted input.

it should be much harder for the LLM to fuck up this way if every single word it reads screams "suspicion" or "trust". with tag tokens at the start it can just forget