| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ddren 1205 days ago
	Seeing the performance of implementations like FlexGen [1], I don't think it would be entirely unreasonable to run a 13B model on a single GPU for personal usage purposes. You are not going to a run a public service out of it, but it probably would be good enough to run your own ChatGPT or Copilot locally. [1]: https://github.com/FMInference/FlexGen