Hacker News new | ask | show | jobs
by larsejonasson 1274 days ago
Could you not use two Markov chains for masked language modeling? One working from the beginning until [MASK] and one working backwards from the end until [MASK]. And then set [MASK] to the average of both chains. If a direct average cannot be found, it is assumed to be a multi-word-expression and words are generated from the two chains until they match.
1 comments

That seems closer to a BiLSTM?