Hacker News new | ask | show | jobs
by visarga 1290 days ago
You could see it as factorised skill learning. In one place it learns how to do some task, in another place it learns how to write nice code, then does both at the same time. It learns all the possible code styles and purposefully uses a good style because it has been taught to choose so - with reinforcement learning from human preferences.