Hacker News new | ask | show | jobs
by GaggiX 1 day ago
Well with a standard autoregressive model you can generate for example 256 tokens at once if you have 256 users, with this approach you can generate 256 tokens for a single user but you need several forward steps.

So the diffusion process takes more GFLOPs, if you have enough users you can already balance memory and compute.

1 comments

Batching is a fair counterpoint.