|
|
|
|
|
by fazkan
622 days ago
|
|
we have vllm in certin production instances, it is a pain for most non-nvidia related architectures. A bit of digging around and we realized that most of it is just a wrapper on top of pytorch function calls. If we can do away with batch processing with vllm supports, we can be good, this is what we did here. |
|
Also, there is a Dockerfile.rocm at the root of vLLM's repo. How is it a pain?