| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gradys 1099 days ago

My intuition on this:

Maximum likelihood training -> faithfully represent training data

Reinforcement learning -> seek out the most preferred answer you can