Hacker News new | ask | show | jobs
by mrdrozdov 3335 days ago
In this work Convolution Neural Nets (spatial models that have a weakly ordered context, as opposed to Recurrent Neural Nets which are sequential models that have a strongly ordered context) are demonstrated here to achieve State of the Art results in Machine Translation.

It seems the combination of gated linear units / residual connections / attention was the key to bringing this architecture to State of the Art.

It's worth noting that previously the QRNN and ByteNet architectures have used Convolutional Neural Nets for machine translation also. IIRC, those models performed well on small tasks but were not able to best SotA performance on larger benchmark tasks.

I believe it is almost always more desirable to encode a sequence using a CNN if possible as many operations are embarrassingly parallel!

The bleu scores in this work were the following:

Task (previous baseline): new baseline

WMT’16 English-Romanian (28.1): 29.88 WMT’14 English-German (24.61): 25.16 WMT’14 English-French (39.92): 40.46