Y
Hacker News
new
|
ask
|
show
|
jobs
by
sr-latch
1178 days ago
Would be cool to try to incorporate the previous token's confidence embedding into this process, but that would make training with a triangular attention mask not possible.