|
|
|
|
|
by joaogante
1124 days ago
|
|
A 3090 (or any GPU with >=20GB VRAM) can run StarCoder with int8 quantization at about 12 tokens per second, 33 with assisted generation -- which will come out for StarCoder in the coming days. When 4-bit quantization comes out, I would expect a GPU with 12GB VRAM to be able to run it. Disclaimer: I work at Hugging Face |
|