Hacker News new | ask | show | jobs
by fatihturker 101 days ago
One question I’m particularly curious about:

At what point does SSD bandwidth become the main bottleneck for inference when weights are heavily compressed? If anyone has experience with streaming layers or low-bit runtimes, would love to hear how you approach it.