Hacker News new | ask | show | jobs
by gloxkiqcza 89 days ago
You can do this on a Mac as well tho, right? So that 128 GB unified memory becomes cache for very fast 1+ TB Apple SSD.
1 comments

I think the advantage of Flash-MoE compared to plain mmap is mostly the coalesced representation where a single expert-layer is represented by a single extent of sequential data. That could be introduced to existing binary formats like GGUF or HF - there is already a provision for differently structured representations, and that would easily fit.