|
|
|
|
|
by acoard
498 days ago
|
|
> Because memory bandwidth is the #1 bottleneck for inference, even more than capacity. But there are a ton of models I can't run at all locally due to VRAM limitations. I'd take being able to run those models slower. I know there are some ways to get these running on CPU orders of magnitude slower, but ideally there's some sort of middle ground. |
|