|
|
|
|
|
by lsy
1174 days ago
|
|
Is there some way of holding the LLM response to a given prompt constant? It sounds like a lot of this relies on the LLM getting the right answer in sequence, so I'm guessing they do something like keep the temperature at 0? Otherwise you are going to wind up with possibly different behavior run-to-run. And even if they do have something like the above, don't we end up with potentially breaking changes once models are updated? Basically the issue is that even if you can guarantee response format X for prompt A, a slightly modified prompt A' has no guarantee that its response will be in the same format as X, even in the same model. You can also imagine that the more "Tools" are available, the lower the chance that the model will pick the right one based on its English text description. Would be interesting to know how this is being addressed. |
|
- Rerun the prompt until you get a format that is consistent
- Steer the output token selection towards a predefined prompt
For the latter, I've built a proof of concept that takes in a JSON schema with a huggingface transformer and constrains token selection by modifying the output probabilities such that only schematically valid tokens can be emitted, see "Structural Alignment: Modifying Transformers (like GPT) to Follow a JSON Schema" @ https://github.com/newhouseb/clownfish. Unfortunately, given OpenAI's current API this is only possible on locally run models. That is... at any level of cost effectiveness. It's technically possible but (worst case) quadratically expensive against their current APIs.