|
|
|
|
|
by maple3142
763 days ago
|
|
Not sure if this is a bit off topic or not, but I recently encountered a problem where my program are continuously calling write to a socket in a loop that loops N times, with each of them sending about a few hundred bytes of data representing an application-level message. The loop can be understanded as some "batched messages" to server. After that, the program will try to receive data from server and do some processing. The problem is that if N is above certain limit (e.g. 4), the server will resulting in some error saying that the data is truncated somehow. I want to make N larger because the round-trip latency is already high enough, so being blocked by this is pretty annoying. Eventually, I found an answer on stackoverflow saying that setting TCP_NODELAY can fix this, and it actually magically enable be to increase N to a larger number like 64 or 128 without causing issues. Still not sure why TCP_NODELAY can fix this issue and why this problem happens in the first place. |
|
With TCP_NODELAY and small messages, this works out fine. Every message is contained in a single packet, and the userspace buffer being read into is large enough to contain it. As such, whenever the kernel has any data to give to userspace, it has an integer number of messages to give. Nothing requires the kernel to respect that, but it will not go out of its way to break it.
In contrast, without TCP_NODELAY, messages get concatenated and then fragmented based on where packet boundaries occur. Now, the natural end point for a call to recv() is not the message boundary, but the packet boundary.
The server is supposed to see that it is in the middle of a message, and make another call to recv() to get the rest of it; but clearly it does not do that.