Hacker News new | ask | show | jobs
by version_five 1202 days ago
Try this, it's for running llms that won't fit in the gpu: https://github.com/FMInference/FlexGen
1 comments

Currently that looks like it only supports facebook's opt and galactica models. Though they do appear to plan to add support for more models.