|
|
|
|
|
by mfalcon
881 days ago
|
|
I love Ollama's simplicity to download and consume different models with its REST API. I've never used it in a "production" environment, anyone knows how Ollama performs? or is it better to move to something like Vllm for that? |
|
Try to, for example, set 'num_gpu' to 99 and 'use_mlock' to true.