| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dekhn 4431 days ago
	Most people who use O_DIRECT writes stop quickly, thinking it's "slow". What's actually happening is you're seeing what the system is actually capable of in terms of write bandwidth, without any of the 'clever' optimizations like write caching.

2 comments

StillBored 4430 days ago

I don't think this is accurate. We have a kernel bypass for disk operations. We use our own memory buffers, and bypass the filesystem, block, and SCSI midlayers. Our stuff is basically what O_DIRECT should be.

There are cases where we are 50% faster than O_DIRECT without any "caching". Furthermore, in high bandwidth applications (>4GB/sec) without O_DIRECT its easy to become CPU limited in the blk/midlayer so again we win.

Now that said, I haven't tried the latest blk-mq, scsi-mq, etc patches which are tuned for higher IOP rates. These patches were driven by people plugging in high performance flash arrays and discovering huge performance issues in the kernel. Still, I expect if you plug in a couple high end flash arrays the kernel is going to be the limit rather than the IO subsystem on a modern xeon.

link

dekhn 4430 days ago

Sure, if you're dealing with extremely high bandwidth apps (4GB/sec is pretty high bandwidth!) what I said doesn't apply.

THe number of people who are sustaining 4GB/sec (on a single machine/device array) is pretty small and they have a reason to go beyond the straightforward approaches the kernel makes available through a simple API (everything you described, like bypass, puts you in a rare category).

Anyway, when I was swapping to SSD, the kswapd process was using 100% of one core while swapping at 500MB/sec. I suspect many kernel threads haven't been CPU-optimized for high throughput.

link

zobzu 4430 days ago

but thats also why his tests are unreliable in this case

link