|
|
|
|
|
by jasonsb
197 days ago
|
|
> Typical savings: 60-90% on most requests, since Gemini Flash is often free/cheapest, but you still get Claude or GPT-4 when needed. This claim seems overstated. Accurately routing arbitrary prompts to the cheapest viable model is a hard problem. If it were reliably solvable, it would fundamentally disrupt the pricing models of OpenAI and Anthropic. In practice, you'd either sacrifice quality on edge cases or end up re-running failed requests on pricier models anyway, eating into those "savings". |
|