Hacker News new | ask | show | jobs
by sr-latch 1178 days ago
Would be cool to try to incorporate the previous token's confidence embedding into this process, but that would make training with a triangular attention mask not possible.