Hacker News new | ask | show | jobs
by KeplerBoy 1144 days ago
That's a lot of requests.

Not that it matters for the calculation, but i wonder how long such a request (ingesting 32k tokens and responding with a similar amount) would take.

At the speed of regular ChatGPT take would take a good while.

1 comments

Batch processing scales quadratically with the context size (assuming OpenAI is still using standard transformer architecture) but the batch processing of the prompt is also fast compared to generating tokens because it's batched (parallel). So I wouldn't expect effective response times to go up quadratically. At most linearly, depending on the details of how they implement inference.