Hacker News new | ask | show | jobs
by adwn 601 days ago
How about just serving more clients in parallel? I don't see why human reading-speed should pose any kind of upper bound.

And then there are use cases like OpenAI's o1, where most tokens aren't even generated for the benefit of a human, but as input for itself.