| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by BoorishBears 305 days ago
	My 2nd most recent submission has a link to it Most of it has been fine-tuning (SFT/DPO/GRPO), but also a lot of prompting and adding steps between the user's prompt and the output