Plenty of training data to go on, I'd imagine.
It seems no different in kind to me than image or audio generation.