Hacker News new | ask | show | jobs
by zuzun 1125 days ago
If I understand it correctly, you are only attending preceding tokens in your paper. Can the constant bias matrix be made symmetric for unmasked tasks?