| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by xg15 2966 days ago

Sounds like the "classic" problem where someone wants to build a reinforcement-learning system (because "self-improving AI" sounds so cool) but don't actually have a suitable reward function that would describe their problem.

Nevertheless, they don't let themselves be caught up by this minor obstacle and use whatever random reward function they can implement with the data they have.

The resulting system won't actually learn to solve the original problem - but it will learn something, so, hey, it's self-improving!

See also: Probably every single recommender system in use. (At least that's my subjective impression)