The X protocol is not particularly well-designed for this. (The X protocol is not particularly well-designed for lots of things.)
One of mosh's advantages is that it does screen rendering on the server, and then synchronizes the screen with the client. So, if there's been ten screenfuls of outputs, and those packets were lost or delayed or the connection couldn't keep up, the client only needs to receive what's currently on the screen. (This doesn't work in the other direction, but there's typically a lot less traffic in that direction.)
You can't quite do that with X, since the X protocol itself involves no way to get / diff / synchronize the contents of the screen -- it involves a bunch of rendering operations, so if there's a delay, it needs to re-send those operations.
One of mosh's advantages is that it does screen rendering on the server, and then synchronizes the screen with the client. So, if there's been ten screenfuls of outputs, and those packets were lost or delayed or the connection couldn't keep up, the client only needs to receive what's currently on the screen. (This doesn't work in the other direction, but there's typically a lot less traffic in that direction.)
You can't quite do that with X, since the X protocol itself involves no way to get / diff / synchronize the contents of the screen -- it involves a bunch of rendering operations, so if there's a delay, it needs to re-send those operations.
Still, there's an open ticket: https://github.com/keithw/mosh/issues/41