| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by version_five 1202 days ago
	Try this, it's for running llms that won't fit in the gpu: https://github.com/FMInference/FlexGen

1 comments

Currently that looks like it only supports facebook's opt and galactica models. Though they do appear to plan to add support for more models.