The first version always leaves a "clean" state, that is both indices points to actual array locations. A mentally "clean" state makes understanding easier. For the third version one has to keep in mind the wrap around behavior of computer specific integers throughout the comprehension process, so it is a bit more difficult (to understand).
The reasoning comes down to how you use it. I use ringbuffers for ultra low latency buffering of market data for instance. If my ringbuffer is so full that I'm worried about its length approaching its capacity then I'm doing something wrong and I should be willing to lose the data. 1 element isn't going to make the difference.
The real reason to stick with the first approach is that your static analysis tools won't freak out that you have intentional unsigned int overflow. Heck, some compilers will now scream at you for doing this. Then what happens when someone goes to port your code to a language with stricter overflow behavior? It won't work.
IMO even in realtime systems, I don't use this. Heck, the linux kernel even uses the original version.
And you don't have expend any mental energy on the integer overflow edge case. It should be handled by using a bitmask and a power-of-2 sized array, should.