| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 4b11b4 400 days ago

This isn't quite RL, right...? It's an evolutionary approach on specifically labeled sections of code optimizing towards a set of metrics defined by evaluation functions written by a human.

I suppose you could consider that last part (optimizing some metric) "RL".

However, it's missing a key concept of RL which is the exploration/exploitation tradeoff.