Hacker News new | ask | show | jobs
by cowsaymoo 709 days ago
The vocabulary used here doesn't have sufficient intrinsic dimension to partition the input into a low loss prediction. Improvement is promising with larger context or denser attention.