If you find this interesting, you might also be interested in this video of someone diving even deeper into how to make the dither surface stable: https://www.youtube.com/watch?v=HPqGaIMVuLs
IMO, the holy grail of 3d dithering is yet to be achieved. runevision's method does not handle surfaces viewed at sharp angles very well. I've thought a lot about a method with fractal adaptive blue noise and analytic anisotropic filtering but I don't yet have the base knowledge to implement it.
My take on it is to use some arbitrary dithering algorithm (e.g. floyd-steinberg, blue noise thresholding, doesn't really matter) for the first frame, and for subsequent frames:
1. Turn the previous dithered framebuffer into a texture
2. Set the UV coordinates for each vertex to their screenspace coordinates from the previous frame
3. Render the new frame using the previous-framebuffer texture and aforementioned UV coords, with nearest-neighbor sampling and no lighting etc. (this alone should produce an effect reminiscent of MPEG motion tracking gone wrong).
4. Render the new frame again using the "regular" textures+lighting to produce a greyscale "ground truth" frame.
5. Use some annealing-like iterative algorithm to tweak the dithered frame (moving pixels, flipping pixels) to minimize perceptual error between that and the ground truth frame. You could split this work into tiles to make it more GPU-friendly.
Steps 4+5 should hopefully turn it from "MPEG gone wrong" into something coherent.
This really is a fantastic video. I don't think I'd considered many of the ideas behind dithering before seeing how it could be extrapolated to this degree.
The video ends in a place where I suspect even further advances could still be made.
Fair point, though I think that when it's low rez enough, it becomes less apparent that it's not in screenspace, and it gets closer to a retro look: https://youtu.be/EzjWBmhO_1E?t=102