|
|
|
|
|
by storystarling
153 days ago
|
|
It sounds like those workloads are memory bandwidth bound. In my experience with generative models, the compute units end up waiting on VRAM throughput, so throwing more wattage at the cores hits diminishing returns very quickly. |
|