| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ein0p 483 days ago
	Seems like Ceph is considerably lower in throughput: https://ceph.io/en/news/blog/2024/ceph-a-journey-to-1tibps/ A serious concern when saving hundreds of terabytes of weights and optimizer states every now and again, or loading large precomputed prefix KV-caches. Minio seems to be slower still. IDK about SeaweedFS - they don't mention performance in their selling points at all.

2 comments

ibotty 478 days ago

Look at the hardware first:

The hardware in the Ceph test is only capable of max 1.7TiB/s traffic (optimally without any overhead whatsoever).

I also assume that the batch size (block size) is different enough that this alone would make a big difference.

link

do_not_redeem 483 days ago

It's quite funny that I got two opposite answers right away: you say it's to improve throughput, and sibling says it's to improve latency, and as we know throughput and latency trade off against each other. I'm inclined to agree it's more likely they're prioritizing throughput, since their readme charts throughput but not latency. But OTOH, the project looks like it requires RDMA. I wonder if the authors have written about their motivations and the tradeoffs they made, so we don't have to speculate.

EDIT: Their blog post answered all my questions and more. https://www.high-flyer.cn/blog/3fs/

link

ein0p 483 days ago

Because the two are interconnected and aren't in conflict with each other. You not only want high throughput - that by itself would be quite limiting. You want it along with low latency as well, or else it's very easy to end up in the situation where your throughput is effectively zero if the access pattern is "bad".

link