Yeah I suppose that's true, too. You've got to do the conversion at some point. I don't know that you get any benefit of doing storing the text, doing the transformation to support whatever ops (deconflicting, etc.) and then transforming back to text again vs just storing it in the intermediate format. Ideally, this would all be transparent to the user anyway.
For one merge, yes. The fun starts when you have a sequence of merges.
CRDTs put ids on tokens, so things are a bit more deterministic.
Imagine a variable rename or a whitespace change; it messes text diffing completely.