You can specify two pairs of images (content+annotation) and it'll transfer the style from one to another as consistently as possible. The down side is that you need to find an algorithm, neural network, or person to create the annotations. (We're working on training one for portraits only.)
These examples are in the paper above, direct link for convenience: