|
|
|
|
|
by aabdi
4 days ago
|
|
Different models do slight variants. Usually it’s done in post training to enforce behavior based on prompt. Ie. System prompt with thinking:max or low or wtv. Enforcement then goes via constrained decoding, checking for think token start and end with max lengths, or other variations |
|