Hacker News new | ask | show | jobs
by Glemllksdf 60 days ago
Isn't that some kind of gambling if you offload random experts onto the CPU?

Or is it only layers but that would affect all Experts?

2 comments

Pretty sure all partial offload systems I’ve seen work by layers, but there might be something else out there.
Speculative decoding is already gambling.