Hacker News new | ask | show | jobs
by awayto 1098 days ago
I think the nature/feel of the quick response is likely due to the fact the completion call is streaming "stream: true" [1]. Then, they are parsing it back to text as it streams the response back.

[1] https://github.com/steven-tey/novel/blob/62d32be43dae8aa40fa...

1 comments

I'd have been more surprised if there wasn't streaming. All the LLMs I have ever used, stream tokens in real-time. I'm talking about the time spent per token.