|
|
|
|
|
by jacquesm
1041 days ago
|
|
I've done a bunch of optimization for GPU code (in CUDA) and there are typically a few bottle necks that really matter: - memory bandwidth - interconnect bandwidth between the CPU and GPU - interconnect bandwidth between GPUs - thermals and power if you're doing a good job of optimizing the rest I don't see how a batching mechanism would improve on any of those, superficially it looks as though that would make matters worse rather than better. Can you explain where the advantage comes from? |
|
https://groq.com/wp-content/uploads/2020/05/GROQP002_V2.2.pd... the "batching" section of https://docs.nvidia.com/deeplearning/tensorrt/archives/tenso... https://le.qun.ch/en/blog/2023/05/13/transformer-batching/