Hacker News new | ask | show | jobs
by vrighter 60 days ago
but some tokens are only not allowed in certain contexts, not others.

You might be talking about how to defuse a bomb, instead of building one. Or you might be talking about a bomb in a video game. Or you could be talking about someone being "da bomb!". Or maybe the history of certain types of bombs. Or a ton of other possible contexts. You can't just block the "bomb" token. Or the word explosive when followed by "device", or "rapid unscheduled disassembly contraption". You just can't predict all infinite wrong possibilities.

And there is no way to figure out which contexts the word is safe in.

1 comments

I'm responding to:

> Fundamentally there's no way to deterministically guarantee anything about the output.

with the fact that you can e.g. force a network to output e.g. syntactically correct code, as long as you can syntax check each token.

You just said an oxymoron right there.

If you're syntax checking every token, you're doing it AFTER the llm has spat out its output. You didn't actually do anything to force the llm to produce correct code. You just reject invalid output after the fact.

If you could force it to emit syntactically correct code, you wouldn't need to perform a separate manual syntax check afterwards.

No, you disallow the LLM to generate invalid tokens. That means you "force it to emit syntactically correct code"
how do you disallow it from generating specific things? My point is that you can't. And again, how do you stop it generating certain tokens, but only in certain contexts?
E.g. you ask it what's 2+2, and only allow it to generate digits in the response. Set other probabilities to 0, then sample the rest. This is trivial.
You would need to somehow analyze the prompt, figure out that the user is asking for an addition of two numbers, and selectively enable that filter. If that filter was left enabled permanently then you'd just functionally have a calculator.

But the analysis of the prompt itself is not a task that can be reliably automated either, for the exact same reasons the original model couldn't consistently do addition properly.

So your solution has the exact same problem as the original. If you ask for an addition, you can't be sure that you will get numbers (you can't be sure the filter will always be enabled when needed). You just shifted the problem out to a separate thing to be "left as an exercise to the reader" and declared the problem trivial.