Y
Hacker News
new
|
ask
|
show
|
jobs
by
drexlspivey
1127 days ago
AFAIK Transformers and context size are orthogonal concepts. You could have large token contexts before. The transformer directs the “attention” to a specific word/token inside the context.