| HN Mirror

Certainly! We'd like our good designs to be picked up by frameworks and serve all users. Currently, Punica is built on top of PyTorch and HuggingFace Transformers ecosystems. Therefore, vLLM and LMDeploy, which are also in the PyTorch ecosystem, should have a smooth adaption. As for Nvidia Triton and TensorRT-LLM, since our kernels are written in CUDA, I believe it will also work seamlessly.

We call for the open source community to help us integrate Punica with all frameworks, thus the whole society can benefit from the efficiency improvement!