Hacker News new | ask | show | jobs
by skyde 751 days ago
Why not apply same concept every time a word is split into more than one token?

Basically if a word contain a Prefix, suffix or root word. We could have a token position relative to the start of the word in the embedding.

1 comments

It seems it has been done before:

"Syntax-Aware Transformer Models for Neural Machine Translation" by Yang et al. (2019). This model enhances the transformer architecture with syntax-aware attention mechanisms that consider dependency parse trees.

Context-Aware Neural Machine Translation Learns Anaphora Resolution" by Bawden et al. (2018). This paper explores integrating context and syntax into neural machine translation models.