I think there's alrrady research for "TTS after NLG" that does this, since a NLG system can export meta-info about emphasis, in addition to the text (at least in case of non-end2end NLG systems).
Whether that makes a big difference in practice, I don't know.
Whether that makes a big difference in practice, I don't know.