|
|
|
|
|
by imjonse
793 days ago
|
|
From this report. Phi-2 was not instruct tuned indeed. "Our models went through post-training with both supervised instruction fine-tuning, and preference tuning with DPO. We have worked on generating and curating various instruction and preference data. This has improved the model chat capabilities, robustness, as well as its safety." |
|