|
|
|
|
|
by impossiblefork
318 days ago
|
|
Fine, but to me reasoning is this the where you have <think> tags and use RL to decide what's to be generated in-between them. Of course, people regarded things like GSM8k with trained reasoning traces as reasoning too, but it's pretty obviously not quite the same thing. |
|