| It was really interesting how this was found. A user started describing file corruption when copying to/from Windows with the io_uring VFS module loaded. Tests using the Linux kernel cifsfs client and the Samba libsmbclient libraries/smbclient user-space transfer utility couldn't reproduce the problem, neither could running Windows against Samba on Ubuntu 19.04. What turned out to be happening was a combination of things. Firstly, the kernel changed so an SMB2_READ request against Samba with io_uring loaded was sometimes hitting a short read, where some of file data was already in the buffer cache, so io_uring now returned a short read to smbd. We returned this to the client, as in the SMB2 protocol it isn't an error to return a short read, the client is supposed to check read returns and then re-issue another read request for any missing bytes. The Linux kernel cifsfs client and Samba libsmbclient/smbclient did this correctly. But it turned out that Windows10 clients and MacOSX Catalina (maybe earlier versions of clients too, I don't have access to those) clients have a horrible bug, where they're not checking read returns when doing pipeline reads. When trying to read a 10GB file for example, they'll issue a series of 1MB reads at 1MB boundaries, up to their SMB2 credit limit, without waiting for replies. This is an excellent way to improve network file copy performance as you fill the read pipe without waiting for reply latency - indeed both Linux cifsfs and smbclient do exactly the same. But if one of those reads returns a short value, Windows10 and MacOSX Catalina DON'T GO BACK AND RE-READ THE MISSING BYTES FROM THE SHORT READ REPLY !!!! This is catastrophic, and will corrupt any file read from the server (the local client buffer cache fills the file contents I'm assuming with zeros - I haven't checked, but the files are corrupt as checked by SHA256 hashing anyway). That's how we discovered the behavior and ended up leading back to the io_uring behavior change. And that's why I hate it when kernel interfaces expose changes to user-space :-). |
This is interesting and somewhat surprising, since Windows IO is internally asynchronous and completion based, and AFAIK file system drivers are not allowed to return a short read except for EOF.
And actually, even on Linux file systems are not supposed to return short reads, right? Even on signal? Since user apps don't expect it? (And thus it's not surprising that io_uring's change broke user apps.)
So it wouldn't be surprising to learn that the Windows SMB server never returns short reads, and thus it's interesting that the protocol would allow it. Do you know what the purpose of this is?