Hacker News new | ask | show | jobs
by mks_shuffle 419 days ago
Does anyone have insights on the best approaches to compare reasoning models? It is often recommended to use a higher temperature for more creative answers and lower temperature values for more logical and deterministic outputs. However, I am not sure how applicable this advice is for reasoning models. For example, Deepseek-R1 and QwQ-32b recommend a temperature around 0.6, rather than lower values like 0.1–0.3. The Qwen3 blog provides performance comparisons between multiple reasoning models, and I am interested in knowing what configurations they used. However, the paper is not available yet. If anyone has links to papers focused on this topic, please share them here. Also, please feel free to correct me if I’m mistaken about anything. Thanks!
1 comments

Oh really? Should I adjust the temp to 0,6 on QwA-32B? Where did you get these numbers from?
These are recommendations provided on huggingface page under usage guidelines QwQ-32b: https://huggingface.co/Qwen/QwQ-32B DeepSeek-R1: https://huggingface.co/deepseek-ai/DeepSeek-R1