| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alew1 1170 days ago
	But the model ultimately still has to process the comma, the newline, the "job". Is the main time savings that this can be done in parallel (on a GPU), whereas in typical generation it would be sequential?

1 comments

Yes. If you look at the biggest models on OpenAI and Anthropic apis, the prompt tokens are significantly cheaper than the response tokens.