| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by danielmarkbruce 523 days ago
	The amount of work going into RLHF/DPO/instruct tuning and other types of post training is because UX is very important. The bar is high and the difficulty of making a model with a good UX for a given use case is high.