| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jiggawatts 117 days ago
	The next gen inference chips will use High Bandwidth Flash (HBF) to store model weights. These are made similarly to HBM but are lower power and much higher capacity. They can also be used for caching to reduce costs when processing long chat sessions.