|
|
|
|
|
by moffkalast
1141 days ago
|
|
Well for running the average model as-is without spending a few days figuring out why you're getting strange errors and can't get it working you more or less need CUDA support. As much VRAM as you can get is probably also a good idea. For reference I can seemingly run Vicuna-7B (I think the 4 bit version) on my 6G 1660 Ti at roughly 1.5 tokens per second. Way too slow for anything useful, so you can imagine what CPU inference would look like. |
|