Hacker News new | ask | show | jobs
Serving LLM 24x Faster on the Cloud with VLLM and SkyPilot (blog.skypilot.co)
12 points by zhwu 1093 days ago
1 comments

Another vLLM post... Its cool, but I still can't tell if its SOTA? Vanilla transformers LLaMA is not optimal at all, especially in the presence of quantized backends like exLlama, GPTQ, Llama.cpp, TVM Llama, and (I think) JAX Llama and Torch-MLIR Llama.