|
|
|
|
|
by adwn
601 days ago
|
|
How about just serving more clients in parallel? I don't see why human reading-speed should pose any kind of upper bound. And then there are use cases like OpenAI's o1, where most tokens aren't even generated for the benefit of a human, but as input for itself. |
|