Hacker News new | ask | show | jobs
Cascade Inference: Memory Bandwidth Efficient Shared Prefix Batch Decoding (flashinfer.ai)
2 points by zhye 860 days ago