Hacker News new | ask | show | jobs
by leereeves 1459 days ago
It would probably just fail with an error "[some function] not implemented for 'Half'"
1 comments

fp16 models inference just fine in fp32, though I was sorta joking in my original comment, it would potentially take weeks for this to run one input. You're better off trying to make something like huggingface accelerate work (like the comment above), which swaps layers of the model on and off the disk