| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Insanity 130 days ago
	This is a pretty cool project! Essentially this is like using Swap memory to extend your RAM, but in a 'smart' way so you don't overload the NVMe unnecessarily. I do wonder in practice how the 'smarts' pan out, because putting a ton of stress on your NVMe during generation is probably not the best choice for it's longevity.

2 comments

zozbot234 130 days ago

This is not putting any stress or wear on the NVMe, it's a pure read workload.

link

tatef 130 days ago

Yes, exactly this.

link

embedding-shape 130 days ago

> but in a 'smart' way so you don't overload the NVMe unnecessarily

"overloading NVMe"? What is that about? First time I've heard anything about it.

> because putting a ton of stress on your NVMe during generation

Really shouldn't "stress your NVMe", something is severely wrong if that's happening. I've been hammering my SSDs forever, and while write operations "hurt" the longevity of the flash cells themselves, the controller interface really shouldn't be affected by this at all, unless I'm missing something here.

link

tatef 130 days ago

Hypura reads tensor weights from the GGUF file on NVMe into RAM/GPU memory pools, then compute happens entirely in RAM/GPU.

There is no writing to SSDs on inference with this architecture.

link

embedding-shape 130 days ago

Even if there was a ton of writing, I'm not sure where NVMe even comes in the picture, write durability is about the flash cells on SSDs, nothing to do with the interface, someone correct me if I'm wrong.

link

Insanity 130 days ago

I had assumed heat generation on the controller if it's continuously reading. But maybe it's not actually bad.

link

throwway120385 130 days ago

Just pop a heatsink on it and call it good.

link