Hacker News new | ask | show | jobs
by whimsicalism 188 days ago
I imagine you have to start decoding many speculative completions in parallel to have true low latency.