|
|
|
|
|
by hemangjoshi37a
69 days ago
|
|
The interesting question isn't "can CF run agent inference" — it's what the routing layer needs to look like for multi-turn workflows. Shipping agent systems to enterprise clients the last year, the bottleneck is never raw tokens/sec. It's (a) state checkpointing betweentool calls, (b) cold-start latency on embedding/rerank models, (c) rate-limit coordination across concurrent agent loops. Does CF expose per-session state, or still stateless-per-request? Without that, you end up building the interesting part yourself. |
|