|
|
|
|
|
by projectorlochsa
3325 days ago
|
|
I don't think reinforcement learning is equivalent over optimizing joint loss. I mean, their model executes X steps and then they calculate the loss using supervised data, use that loss to learn. The same is being done with machine translation models when they optimize over BLEU. It's still supervised learning because to calculate the loss you need reference data. |
|