The answer to that is a huge part of the NLP field. The current answer is that you break down the string into constituent parts and map each of them into a high dimensional space. “cat” becomes a large vector whose position is continuous and therefore differentiable. “the cat” probably becomes a pair of vectors.
It's weirder than that. You typically are differentiating a loss function of strings and various opaque weights. You are optimizing the loss function over the weight space, so in some informal contrived sense you are actually differentiating with respect to the string.
Sometimes there are other better ways to describe "how does changing x affect y". Derivatives are powerful but they are not the only possible description of such relationships.
I'm very excited for what other things future "compilers" will be able to do to programs besides differentiation. That's just the beginning.
If you were dealing with e.g. English words rather than arbitrary strings, one approach would be to treat each word as a point in n-dimensional space. Then you can use continuous (and differentiable) functions to output into that space.