|
|
|
|
|
by pyentropy
4 days ago
|
|
Take a look at the harmony repo which specifies the internal OpenAI format - the effort level is specified in the context after the <|start|> tag - https://github.com/openai/harmony Note that inference libs also have parsers that put hard limits on reasoning tokens with separate counters (similar to how you can put a limit on token generation per completion versus waiting for an <eos>). For that, take a look at vllm reasoning docs. |
|
https://docs.vllm.ai/en/latest/features/reasoning_outputs/#a...
https://developers.openai.com/api/docs/guides/reasoning