Hacker News new | ask | show | jobs
by leereeves 1458 days ago
A comment below said this model uses fp16 (half-precision). If so, it won't easily run on CPU because PyTorch doesn't have good support for fp16 on CPU.
1 comments

Parent never claimed it was going to be fast.
It would probably just fail with an error "[some function] not implemented for 'Half'"
fp16 models inference just fine in fp32, though I was sorta joking in my original comment, it would potentially take weeks for this to run one input. You're better off trying to make something like huggingface accelerate work (like the comment above), which swaps layers of the model on and off the disk