Serving LLM 24x Faster on the Cloud with VLLM and SkyPilot

Another vLLM post... Its cool, but I still can't tell if its SOTA? Vanilla transformers LLaMA is not optimal at all, especially in the presence of quantized backends like exLlama, GPTQ, Llama.cpp, TVM Llama, and (I think) JAX Llama and Torch-MLIR Llama.