Hacker News new | ask | show | jobs
by activatedgeek 939 days ago
I looked at Ollama before, but couldn't quite figure something out from the docs [1]

It looks like a lot of the tooling is heavily engineered for a set of modern popular LLM-esque models. And looks like llama.cpp also supports LoRA models, so I'd assume there is a way to engineer a pipeline from LoRA to llama.cpp deployments, which probably covers quite a broad set of possibilities.

Beyond llama.cpp, can someone point me to what the broader community uses for general PyTorch model deployments?

I haven't quite ever self-hosted models, and am really keen to do one. Ideally, I am looking for something that stays close to the PyTorch core, and therefore allows me the flexibility to take any nn.Module to production.

[1]: https://github.com/jmorganca/ollama/blob/main/docs/import.md.