Hacker News new | ask | show | jobs
by ProofHouse 305 days ago
Well they can be used together in some contexts so while they are different, you could also say RL can help Supervised Fine Tuning for further optimization