|
|
|
|
|
by throwdbaaway
128 days ago
|
|
If you ask someone knowledgeable at r/LocalLLaMA about an inference configuration that can increase TG by *up to* 2.5x, in particularly for a sample prompt that reads "*Refactor* this module to use dependency injection", then the answer is of course speculative decoding. You don't have to work for a frontier lab to know that. You just have to be GPU poor. |
|