Hacker News new | ask | show | jobs
by brrrrrm 610 days ago
fake it.

add some latency to the first token and then "stream" at the rate you received tokens even though the entire thing (or some sizable chunk) has been generated. that'll give you the buffer you need to seem fast while also staying safe.