| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pyentropy 52 days ago
	Take a look at the harmony repo which specifies the internal OpenAI format - the effort level is specified in the context after the <\|start\|> tag - https://github.com/openai/harmony Note that inference libs also have parsers that put hard limits on reasoning tokens with separate counters (similar to how you can put a limit on token generation per completion versus waiting for an <eos>). For that, take a look at vllm reasoning docs.

1 comments

pyentropy 51 days ago

Examples with inference of different reasoning effort levels is in the OpenAI docs as well - https://developers.openai.com/cookbook/articles/openai-harmo...

https://docs.vllm.ai/en/latest/features/reasoning_outputs/#a...

https://developers.openai.com/api/docs/guides/reasoning

link

simianwords 51 days ago

I think you have the right answer but I'm struggling to understand: does changing the effort change the prompt at the start of the conversation? I wonder why come up with this way at all? Why not just add a parameter at the end or something? At least it won't break cache.

Maybe like: add a secret suffix to your chat in the conversation to think more like

   conversation....

   Hey please help
   [think more]

link

pyentropy 51 days ago

I'm considering the possibility that it's good to break the prefix and cache because the LLM itself was rewarded (during post-training) with different prefixes/system prompts, each containing reasoning traces of the correct size.

I might be very very wrong though and LLMs disagree with me, insisting that cache is preserved and the system message doesn't have to change (even though it often contains effort level in context) if effort level changes across turns, and that all you have to do is tell the inference lib that parses think tags to early-close think tags that are too long.

link

simianwords 51 days ago

This seems correct but again I would like to think post training could have been also done by checking only the string in the last message sent.

link