|
|
|
|
|
by khafra
615 days ago
|
|
If you want to set up an AI server for your own use, it's exceedingly easy to install LM Studio and hit the "serve an API" button. Testing performance this way, I got about 0.5-1.5 tokens per second with an 8GB 4bit quantized model on an old DL360 rack-mount server with 192GB RAM and 2 E5-2670 CPUs. I got about 20-50 tokens per second on my laptop with a mobile RTX 4080. |
|