Hacker News new | ask | show | jobs
by sleepyeldrazi 36 days ago
Think of this as another way of achieving that. This theoretically has a higher ceiling of how much it can predict at a time. And more importantly is a lot more memory efficient during actual inference.