Hacker News new | ask | show | jobs
by BoorishBears 305 days ago
My 2nd most recent submission has a link to it

Most of it has been fine-tuning (SFT/DPO/GRPO), but also a lot of prompting and adding steps between the user's prompt and the output