Hacker News new | ask | show | jobs
by semmulder 872 days ago
FYI, vLLM also just added experimental multi-lora support: https://github.com/vllm-project/vllm/releases/tag/v0.3.0

Also check out the new prefix caching, I see huge potential for batch processing purposes there!

2 comments

Yes, we (LoRAX devs) saw that (we know the author pretty well). It's a useful addition, though quite a bit simpler than our level of support for multi-LoRA inference. We're planning on doing a more comprehensive comparison soon, now that it's officially out.

I will say that if you want to explore the forefront of this multi-LoRA inference, definitely worth giving LoRAX a look. We just added support for per-request model merging (https://predibase.github.io/lorax/guides/merging_adapters/) as an example, and are planning on continuing to double down on this idea of combining adapters in some pretty unique ways.

Missed this, thanks.

Everything is moving so fast!