Hacker News new | ask | show | jobs
by magicalhippo 2237 days ago
I know nothing about io_uring but looking at the man page[1] of readv I see it returns number of bytes read. For me as a developer that's an unmistakable flag that partial reads is possible.

Was readv changed? The man page also states that partial reads is possible, but I guess that might have been added later?

If it always returned bytes read, it would hardly be the first case where the current behavior is mistaken for the specification. My fondest memory of that is all the OpenGL 1.x programs that broke when OpenGL 2.x was released.

[1]: http://man7.org/linux/man-pages/man2/readv.2.html

2 comments

Also, note the preadv2 man page which has a flags field with one flag defined as:

-------------------------------

RWF_NOWAIT (since Linux 4.14) Do not wait for data which is not immediately available. If this flag is specified, the preadv2() system call will return instantly if it would have to read data from the backing storage or wait for a lock. If some data was successfully read, it will return the number of bytes read.

-------------------------------

This implies that "standard" pread/preadv/preadv2 without that flag (which is only available for preadv2) will block waiting for all bytes (or short return on EOF) and you need to set a flag to get the non-blocking behavior you're describing here. Otherwise the flag would be the inverse - RWF_WAIT, implying the standard behavior is the non-blocking one, not the blocking one.

The blocking behavior is what we were expecting (and previously got) out of io_uring, so it was an unpleasant surprise to see the behavior change visible to user-space in later kernels.

> If this flag is specified, the preadv2() system call will return instantly if it would... wait for a lock.

Doesn't this sound a bit different from ordinary short reads?

Receiving EAGAIN usually happens under fairly specific conditions (signal interruption), but I'd imagine, that filesystem code has a great deal of locks.

For example, FUSE filesystems can support signal interruptions via EAGAIN, but they are not guaranteed to. You can end up in situation, when FUSE filesystem hangs, and you can not interrupt the thread, which reads from it. I suspect, that RWF_WAIT is a "fix" for similar situations and not the opposite of default behavior.

Well pread/pwrite have the same return values, and historically for disk reads they block or return a device error.

pread only returns a short value on EOF.

Well, the man page does say that "The readv() system call works just like read(2) except that multiple buffers are filled".

If we go to read(2) we find "It is not an error if [the return value] is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now [...], or because read() was interrupted by a signal."

As an outsider, I'd never rely on this returning the requested number of bytes. If I required N bytes, I'd write use a read loop.

But I do agree that the RWF_NOWAIT flag mentioned in your other comment doesn't help, as it suggests the default is to block.

Well, or EINTR if your signal handlers are not SA_RESTART.
For EINTR it never returns a short read, as the only way to see EINTR is a return of -1 with errno==EINTR.

We handle signals fine.

Sure, I was imprecise. A signal can cause a read to return a short result.