Hacker News new | ask | show | jobs
by dooxoo 3326 days ago
> there have been a couple of attempts to use CNNs for translation already, but none of them outperformed big and well-tuned LSTM systems

It is true that QRNN had results on mostly small-scale benchmarks, but it seemed that Bytenet especially the second version had SOTA results both for language models with characters and for machine translation with characters on the same large-scale En-De WMT task that is used in this paper.

MT with characters, with regards to ordering, structure, etc, is potentially much harder than with words or word-pieces, since the encoded sequences are 5 or 6 times longer on average, and the meanings of words need to be built up from individual characters.

1 comments

Yes, ByteNet v2 outperforms LSTMs on characters but not on word pieces. It would be interesting to see how our model performs on characters, especially when scaled up to the size of ByteNet (30+30 layers) and also how ByteNet performs on BPE codes. I think that character-level NMT is definitely interesting and worth investigating, but from a practical point of view it makes sense to choose a representation that achieves the maximum translation accuracy and speed.