|
|
|
|
|
by chessgecko
865 days ago
|
|
This is wrong, being memory bound or not has to do with the dimensions of the matrices being multiplied (if you’re on tensor cores). https://docs.nvidia.com/deeplearning/performance/dl-performa... Some of the things being done to improve quality of 6-8 bit inference use extra compute and push it a little in the other direction but it’s still pretty memory intense until the batch size gets quite large |
|