Y
Hacker News
new
|
ask
|
show
|
jobs
Cutting LLM Batch Inference Time by Half with Dynamic Prefix Bucketing
(
daft.ai
)
2 points
by
DISCURSIVE
216 days ago