| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yalogin 483 days ago
	It’s not clear to me where and how the current popular systems fall short. Do they talk about I anywhere? Also, what specifically is the data access patterns for training and inference that are different from traditional use cases?

1 comments

jpgvm 483 days ago

Well current popular systems are pretty much limited to Lustre and the new kid Weka, mostly Lustre though tbh.

You can try to use "standard" options like MinIO/Ceph(RADOS)/SeaweedFS but you will very quickly learn those systems aren't remotely fast enough for these usecases.

AI training is what this is used for, not inference (which has absolutely no need for any filesystem at all). What makes the workload somewhat special is that it's entirely random read and not cacheable at all as most reads are one and done.

Would Lustre be perfectly fine at 6TiB/s? Yes. Is it a huge pain in the ass to operate and make remotely highly available? Also yes. If this thing is capable of the throughput but easier to operate and generally more modern and less baroque it's probably an improvement. TLDR is Lustre is fast but that is literally it's only redeeming quality. I have lost far too many hours of my life to the Lustre gods.

link

rfoo 482 days ago

> What makes the workload somewhat special is

I'll add that latency also doesn't matter that much. You are doing batched data loading for batch n+1 on CPU when GPUs are churning batch n-1 and copying batch n from host memory at the same time.

So as long as your "load next batch" doesn't run for like >1s it would be fine. But one single "load next batch" on one worker means thousands (if not more) random read.

link

cyanf 482 days ago

They’re using the FS for caching the KV caches of past requests. It’s why they’re able to charge so little on prompt cache hit.

link

jpgvm 482 days ago

Ahh I missed that. Yes prefix caching and RAG are 2 cases were you will want something like this during inference time.

link