Y
Hacker News
new
|
ask
|
show
|
jobs
by
mathis
328 days ago
This might be more pure, but there is nothing to be gained. On the contrary, this would lead to very long sequences for which self-attention scales poorly.