Hacker News new | ask | show | jobs
Cutting LLM Batch Inference Time by Half with Dynamic Prefix Bucketing (daft.ai)
2 points by DISCURSIVE 216 days ago