Hacker News new | ask | show | jobs
by robrenaud 202 days ago
He is essentially expanding upon an idea made by Andrej Karpathy on his podcast about a month prior.

Karpathy says that basically "RL sucks" and that it's like "sucking bits of supervision through a straw".

https://x.com/dwarkesh_sp/status/1979259041013731752/mediaVi...