|
|
|
|
|
by BoorishBears
1042 days ago
|
|
I just left a comment along these lines, but realistically it's probably cheaper to just re-emit than to add the machinery that enables this to their existing architecture. At most I could have seen them maybe running a schema validator against the output and re-requesting on your behalf, but even that's probably cheaper for them to do client side (I will say, I'm surprised their API wrapper hasn't been updated to do this yet) |
|
this is the part that blows my mind. You don't have to do this! You don't have to sample the entire output, and then validate after the fact.
You're not required to greedily pick the token with the highest score. You get the scores of all tokens, on every forward pass! So why even waste time picking invalid tokens if you're just going to validate and retry later on??
(note: when I say "you" here, I mean whoever is hosting the model. It is true that OpenAI does not expose all token scores, it only gives you back the highest-scoring one. So a client-side library is not able to perform this grammar-based sampling.
BUT, OpenAI themselves host host the model, and they see all token outputs, with all scores. And in the same API request, they allow you to pass the "function definition" as a JSON schema. So why not simply apply that function definition as a mask on the token outputs? They could do this without exposing all token scores to you, which they seem very opposed to for some reason.)