Y
Hacker News
new
|
ask
|
show
|
jobs
by
danielhanchen
448 days ago
Oh fantastic! Oh for MoEs like DeepSeek, technically GPUs aren't that necessary! I actually tested on 1x H100 I think it was 30 layers offloaded, and the other 30 are on CPU - it wasn't that bad at all!