It’s quite clever.
Matching styles using vision models is not that easy, especially if you want to capture the core subject of the prompt.