|
|
|
|
|
by dooxoo
3326 days ago
|
|
> there have been a couple of attempts to use CNNs for translation already, but none of them outperformed big and well-tuned LSTM systems It is true that QRNN had results on mostly small-scale benchmarks, but it seemed that Bytenet especially the second version had SOTA results both for language models with characters and for machine translation with characters on the same large-scale En-De WMT task that is used in this paper. MT with characters, with regards to ordering, structure, etc, is potentially much harder than with words or word-pieces, since the encoded sequences are 5 or 6 times longer on average, and the meanings of words need to be built up from individual characters. |
|