Y
Hacker News
new
|
ask
|
show
|
jobs
by
alew1
1123 days ago
But the model ultimately still has to process the comma, the newline, the "job". Is the main time savings that this can be done in parallel (on a GPU), whereas in typical generation it would be sequential?
1 comments
sebzim4500
1122 days ago
Yes. If you look at the biggest models on OpenAI and Anthropic apis, the prompt tokens are significantly cheaper than the response tokens.
link