| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bugbuddy 850 days ago
	Aren’t 10G and 100G connections standard nowadays in data centers? Heck, I thought they were standard 10 years ago.

4 comments

geerlingguy 850 days ago

Datacenters are up to 400 Gbps and beyond (many places are adopting 1+ Tbps on core switching).

However, individual servers may still operate at 10, 25, or 40 Gbps to save cost on the thousands of NICs in a row of racks. Alternatively, servers with multiple 100G connections split that bandwidth allocation up among dozens of VMs so each one gets 1 or 10G.

link

KaiserPro 850 days ago

Yes, but you have to think about contention. Whilst the Top of rack might have 2x400 gig links to the core, thats shared with the entire rack, and all the other machines trying to shout at the core switching infra.

Then stuff goes away, or route congested, etc, etc, etc.

link

pixl97 850 days ago

Bandwidth delay product does not help serialized transactions. If you're reaching out to disk for results, or if you have locking transactions on a table the achievable operations drops dramatically as latency between the host and the disk increases.

link

bee_rider 850 days ago

The typical way to trade bandwidth away for latency would, I guess, be speculative requests. In the CPU world at least. I wonder if any cloud providers have some sort of framework built around speculative disk reads (or maybe it is a totally crazy trade to make in this context)?

link

Nextgrid 850 days ago

You'd need the whole stack to understand your data format in order to make speculative requests useful. It wouldn't surprise me if cloud providers indeed do speculative reads but there isn't much they can do to understand your data format, so chances are they're just reading a few extra blocks beyond where your OS read and are hoping that the next OS-initiated read will fall there so it can be serviced using this prefetched data. Because of full-disk-encryption, the storage stack may not be privy to the actual data so it couldn't make smarter, data-aware decisions even if it wanted to, limiting it to primitive readahead or maybe statistics based on previously-seen patterns (if it sees that a request for block X is often followed by block Y, it may choose to prefetch that next time it sees block X accessed).

A problem in applications such as databases is when the outcome of an IO operation is required to initiate the next one - for example, you must first read an index to know the on-disk location of the actual row data. This is where the higher latency absolutely tanks performance.

A solution could be to make the storage drives smarter - have an NVME command that could say like "search in between this range for this byte pattern" and one that can say "use the outcome of the previous command as a the start address and read N bytes from there". This could help speed up the aforementioned scenario (effectively your drive will do the index scan & row retrieval for you), but would require cooperation between the application, the filesystem and the encryption system (typical, current FDE would break this).

link

treflop 850 days ago

Often times it’s the app (or something high level) that would need speculative requests, which it may not be possible in the given domain.

I don’t think it’s possible in most domains.

link

pixl97 850 days ago

I mean we already have readahead in the kernel.

This said the problem can get more complex than this really fast. Write barriers for example and dirty caches. Any application that forces writes and the writes are enforced by the kernel are going to suffer.

The same is true for SSD settings. There are a number of tweakable values on SSDs when it comes to write commit and cache usage which can affect performance. Desktop OS's tend to play more fast and loose with these settings and servers defaults tend to be more conservative.

link

nixass 850 days ago

400G is fairly normal thing in DCs nowadays

link