Hacker News new | ask | show | jobs
by danielhanchen 448 days ago
Oh fantastic! Oh for MoEs like DeepSeek, technically GPUs aren't that necessary! I actually tested on 1x H100 I think it was 30 layers offloaded, and the other 30 are on CPU - it wasn't that bad at all!