Hacker News new | ask | show | jobs
by unhammer 3289 days ago
Bit of an aside: Apparantly ChrF – character-level n-gram f-score – is the new hotness in evaluating MT systems http://www.aclweb.org/anthology/W/W15/W15-30.pdf#page=412
1 comments

OK, but this still has the same problem as BLEU- it relies on comparisons to human scores, which are entirely subjective. I'm not saying they're not the best we got, but it's a big problem for machine translation that the only way to evaluate results is, essentially, comparing it to eyballing.