Essentially xlib is latency limited because its a synchronous protocol. Xcb can help if you redesign your applications for it, but we are talking about old applications (motif...)
You know xcb and xlib are just different client libraries for accessing the exact same wire protocol, right?
Moreover, notice the example they gave there is atom interning, which is a roundtrip on the wire (though you can batch them even in xlib...), and they say "most real applications will see less benefit than this" since that's the worst case - most applications do atom work at startup, not in the main use cycle, which is famously both async and buffered (it used to be a FAQ on xlib tutorials reminding people to run the event loop, and error handling is complicated a bit by it).
Moreover, notice the example they gave there is atom interning, which is a roundtrip on the wire (though you can batch them even in xlib...), and they say "most real applications will see less benefit than this" since that's the worst case - most applications do atom work at startup, not in the main use cycle, which is famously both async and buffered (it used to be a FAQ on xlib tutorials reminding people to run the event loop, and error handling is complicated a bit by it).