Hacker News new | ask | show | jobs
by jumpCastle 812 days ago
Also the parameters are optimized also with loss of future tokens in the sequence.