| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fatihturker 101 days ago
	One question I’m particularly curious about: At what point does SSD bandwidth become the main bottleneck for inference when weights are heavily compressed? If anyone has experience with streaming layers or low-bit runtimes, would love to hear how you approach it.