| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by awayto 1098 days ago
	I think the nature/feel of the quick response is likely due to the fact the completion call is streaming "stream: true" [1]. Then, they are parsing it back to text as it streams the response back. [1] https://github.com/steven-tey/novel/blob/62d32be43dae8aa40fa...

1 comments

LoganDark 1098 days ago

I'd have been more surprised if there wasn't streaming. All the LLMs I have ever used, stream tokens in real-time. I'm talking about the time spent per token.

link