|
|
|
|
|
by peakji
606 days ago
|
|
The model can already answer some tricky questions that other models (including GPT-4o) have failed to address, achieving a +5.56 improvement on the GPQA-Diamond dataset. Unfortunately, it has not yet managed to reproduce inference-time scaling. I will continue to explore different approaches! |
|
can you compare with just qwen 32b with CoT?