|
|
|
|
|
by mks_shuffle
419 days ago
|
|
Does anyone have insights on the best approaches to compare reasoning models? It is often recommended to use a higher temperature for more creative answers and lower temperature values for more logical and deterministic outputs. However, I am not sure how applicable this advice is for reasoning models. For example, Deepseek-R1 and QwQ-32b recommend a temperature around 0.6, rather than lower values like 0.1–0.3. The Qwen3 blog provides performance comparisons between multiple reasoning models, and I am interested in knowing what configurations they used. However, the paper is not available yet. If anyone has links to papers focused on this topic, please share them here. Also, please feel free to correct me if I’m mistaken about anything. Thanks! |
|