Hacker News new | ask | show | jobs
Speeding up LLM Inference with parallel decoding (twitter.com)
1 points by pgspaintbrush 1028 days ago