Hacker News new | ask | show | jobs
by mfalcon 881 days ago
I love Ollama's simplicity to download and consume different models with its REST API. I've never used it in a "production" environment, anyone knows how Ollama performs? or is it better to move to something like Vllm for that?
2 comments

The performance will probably be similar as long as you remember to tune the settings listed here: https://github.com/ollama/ollama/blob/main/docs/api.md

Try to, for example, set 'num_gpu' to 99 and 'use_mlock' to true.

They all probably already use elements of deep learning but are very likely trained in a supervised way to output structured data (I.e. actions)
+1