| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imjonse 793 days ago
	From this report. Phi-2 was not instruct tuned indeed. "Our models went through post-training with both supervised instruction fine-tuning, and preference tuning with DPO. We have worked on generating and curating various instruction and preference data. This has improved the model chat capabilities, robustness, as well as its safety."