| HN Mirror

I suspect (I have not read that much NLP literature) that BLEU is typically used as evaluation only, not as the training loss. eg Google's "Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation" mentions directly optimizing for BLEU, but again via RL and not supervised learning. It certainly is a quirky example of RL, though... guess that's the pace new ideas/approaches are introduced these days.