|
|
|
|
|
by diggan
388 days ago
|
|
> The most important AI applications being deployed in enterprise today—agents, code generation, and complex reasoning—are bottlenecked by inference latency Is this really true today? I don't work in enterprise, so don't know how things look like, but I'm sure lots of people here do, and it feels unlikely that inference latency is the top bottleneck, even above humans or waiting for human input? Maybe I'm just using LLMs very differently from how they're deployed in a enterprise, but I'm by far the biggest bottleneck in my setup currently. |
|
Ideally I can just run the prompt 100x and have it pick the best solution later. That’s prohibitively expensive and a waste of time today.