Y
Hacker News
new
|
ask
|
show
|
jobs
by
sleepyeldrazi
36 days ago
Think of this as another way of achieving that. This theoretically has a higher ceiling of how much it can predict at a time. And more importantly is a lot more memory efficient during actual inference.