You probably need an enormous GPU (24GB RAM) as well to make as large model as possible for as good generalization as you can (there are so many different types of objects/surfaces/fabric and their compositions).
It's Deep Learning, not much to do with any analytical model, it's not thinking like a human :-(. Recently even good NLP processing needs 24GB+ for training (won't fit into 16GB), a good quality colorizing (no spills, natural colors) could be expected to be as demanding.
From the article:
"BEEFY Graphics card. I'd really like to have more memory than the 11 GB in my GeForce 1080TI (11GB). You'll have a tough time with less. The Unet and Critic are ridiculously large but honestly I just kept getting better results the bigger I made them."