Maximum likelihood training -> faithfully represent training data
Reinforcement learning -> seek out the most preferred answer you can