|
|
|
|
|
by fatihturker
101 days ago
|
|
One question I’m particularly curious about: At what point does SSD bandwidth become the main bottleneck for inference when weights are heavily compressed? If anyone has experience with streaming layers or low-bit runtimes, would love to hear how you approach it. |
|