Hacker News new | ask | show | jobs
by jgehring 3325 days ago
Yes, ByteNet v2 outperforms LSTMs on characters but not on word pieces. It would be interesting to see how our model performs on characters, especially when scaled up to the size of ByteNet (30+30 layers) and also how ByteNet performs on BPE codes. I think that character-level NMT is definitely interesting and worth investigating, but from a practical point of view it makes sense to choose a representation that achieves the maximum translation accuracy and speed.