Hacker News new | ask | show | jobs
by gradys 1099 days ago
My intuition on this:

Maximum likelihood training -> faithfully represent training data

Reinforcement learning -> seek out the most preferred answer you can