Hacker News new | ask | show | jobs
by czhu12 2853 days ago
Researchers have used reinforcement learning techniques to train neural models with non differentiable loss functions.

For instance, a common practice is to: 1. train a translation model with standard cross entropy loss 2. fine tune the model with reinforcement learning against BLEU scores.

BLEU is generally the metric thats reported in translation papers, but it can't be used as an objective function since it isn't differentiable.

Reinforcement learning can be used to generate gradients from these types of objectives.

Game playing, recommendations and ad suggestions are all examples problems with non differentiable objectives, but there are many more types of these problems that I think we haven't explored as deeply, that can potentially be solved with RL.