Hacker News new | ask | show | jobs
by q1w2 1200 days ago
Great, now how do I run it? Do I need a GPU with over 65GB RAM?
3 comments

Try this, it's for running llms that won't fit in the gpu: https://github.com/FMInference/FlexGen
Currently that looks like it only supports facebook's opt and galactica models. Though they do appear to plan to add support for more models.
Generally, you'll need multiply model size by two to get required amount of video RAM. There are 4 sizes, so you might get away with even smaller GPU for say 13B model.
Nope, more like 111gb