That would work, but perform like caro. There are a large number of effects that need shared memory tricks to work efficiently in X. So, any "modern" desktop is going to be iffy running that way.
For windows, try mobaxterm. I tried a half dozen x11 servers a few years ago, and it was the best by far.
It works great, modulo 3d acceleration. If I remember right, cut and paste work well, so that (plus a file share for Downloads) gets rid of the need to run a web browser in Linux.
Similarly, it can use the windows wm to manage the x11 windows, so you automatically bypass the linux compositor.
[edit: Also, Hyper-V is extremely fast in this type of setup, because they focus on server performance, and this workload looks like any other network/io intensive server]