|
|
|
|
|
by samhoss93
20 days ago
|
|
Agree. At high concurrency, you are better off spending the compute budget on parallel requests rather than draft prediction. The challenging part is that most deployment don't have static traffic profiles. A configuration that was right at launch may no longer be correct months later, and there is no signal that tells you when you have crossed the threshold. |
|