Y
Hacker News
new
|
ask
|
show
|
jobs
by
BorisTheBrave
1120 days ago
I think you've misunderstood. Megabyte scales better with context window length. I don't know if they're saying the training data / compute are any more efficient.