Hacker News new | ask | show | jobs
by drexlspivey 1127 days ago
AFAIK Transformers and context size are orthogonal concepts. You could have large token contexts before. The transformer directs the “attention” to a specific word/token inside the context.