|
|
|
|
|
by ddren
1205 days ago
|
|
Seeing the performance of implementations like FlexGen [1], I don't think it would be entirely unreasonable to run a 13B model on a single GPU for personal usage purposes. You are not going to a run a public service out of it, but it probably would be good enough to run your own ChatGPT or Copilot locally. [1]: https://github.com/FMInference/FlexGen |
|