Hacker News new | ask | show | jobs
by projectorlochsa 3317 days ago
Well, BLEU is non-differentiable and not decomposable over sequence of translation decisions. Yet I wouldn't call methods reinforcement learning because loss is tricky.

But yeah, I guess there's more to it than meets the eye.

1 comments

I suspect (I have not read that much NLP literature) that BLEU is typically used as evaluation only, not as the training loss. eg Google's "Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation" mentions directly optimizing for BLEU, but again via RL and not supervised learning. It certainly is a quirky example of RL, though... guess that's the pace new ideas/approaches are introduced these days.