|
|
|
|
|
by rfoo
483 days ago
|
|
... and batching does not help, you batch more requests and get more kvcache to load, still memory-access bound. MLA made it possible to cache a smaller form of k/v, mitigating (but not completely solve, on shorter context & smaller batches it's still memory-access bound) the problem. |
|