A prompt-trained DeepSeek R1 70B can perform better than GPT-o1 using AdalFlow

Time to move to open-source and smaller reasoning model.

Here are the top three learnings from auto-prompt optimizing DeepSeek R1 LLaMA70B for RAG:

1⃣ A trained DeepSeek R1 LLaMA70B(r1 distilled) is even better than GPT-o1 without training. 2⃣ The “Reasoning” model is less susceptible to overfitting compared with non-reasoning models. By comparing it with GPT-3.5, both gpt3.5 and r1 distilled start at the same accuracy and reach similar accuracy on the validation dataset. However, on the test dataset, r1 distilled often achieves much higher accuracy. 3⃣ R1 can think too long and run out of output tokens before finishing the task. The optimized prompt specifically added instructions for it to “think less.”