Hacker News new | ask | show | jobs
by visarga 1093 days ago
Don't you know that "attention is all you need"? Attention is non-markovian. It's all-to-all with some masking, not a chain.