|
|
|
|
|
by condiment
118 days ago
|
|
At 16k tokens/s why bother routing? We're talking about multiple orders of magnitude faster and cheaper execution. Abundance supports different strategies. One approach: Set a deadline for a response, send the turn to every AI that could possibly answer, and when the deadline arrives, cancel any request that hasn't yet completed. You know a priori which models have the highest quality in aggregate. Pick that one. |
|