Y
Hacker News
new
|
ask
|
show
|
jobs
by
version_five
1202 days ago
Try this, it's for running llms that won't fit in the gpu:
https://github.com/FMInference/FlexGen
1 comments
gpm
1202 days ago
Currently that looks like it only supports facebook's opt and galactica models. Though they do appear to plan to add support for more models.
link