Hacker News new | ask | show | jobs
by nestorD 845 days ago
MoE let's you use scale model size up with compute. That leads to hopefully more intelligent models. It, however, is independent with context size: the ability to process a lot of tokens / text.