Hacker News new | ask | show | jobs
by BorisTheBrave 1120 days ago
I think you've misunderstood. Megabyte scales better with context window length. I don't know if they're saying the training data / compute are any more efficient.