This is essentially the premise behind Generative Adversarial Networks, and if you've seen the results, they're astounding. They're much better for specialized tasks than their generalized GPT counterparts.
GANs pair a generative model with a classification model (both unsupervised) whose loss functions have been designed to be antithetical. Basically, one performing well means the other is performing poorly. Keeping with the example posed by the given link, this results in a kind of hyper-optimization that causes the generative model to gradually hone in on the perfect way to render a face, while the classification model keeps pace with it and feeds back that "I don't see a face" until something resembling a face emerges. With this approach, you can start with complete noise and end up at a photorealistic face.
I'm not sure that's a valid statement on either count. There is plenty of work being done to bolster GANs with diffusion, in an attempt to take GANs where they couldn't before. Here's one such example: https://arxiv.org/abs/2206.02262
You might've been more correct to say that diffusion surpassed prior generative models, but the adversarial element doesn't even compare to diffusion at all. The adversarial element would be more accurately seen as a trade-off for standard RLHF/Human-in-the-Loop models.
I will bet money that GANs bolstered with diffusion will far outperform a standalone diffusion model.