| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by gyom 1807 days ago

Part of the cleverness of GANs was to have found a way to train a neural network that generates data without explicitly modeling the probability density.

In a stats textbook, when you know that your training data comes from a normal distribution, you can maximize the MLE wrt the parameters, and then use that for sampling. That's basic theory.

In practice, it was very hard to learn a good pdf for experimental data when you had a training set of images. GANs provided a way to bypass this.

Of course, people could have said "hey let's generate samples without maximizing a loglikelihood first", but they didn't know how to do it properly, how to train the network in any other way besides minimizing cross-entropy (which is equivalent to maximizing loglikelihood).

Then GANs actually provided a new loss function that could be trained. Total paradigm shift!

2 comments

whimsicalism 1807 days ago

I'm on board with all of this, I think even before GANs it was becoming popular to optimize loss that wasn't necessarily a log likelihood.

But I'm confused by the usage of the phrase generative model, which I took to always mean a probabilistic model of the joint that can be sampled over. I get that GANs generate data samples, but it seems different.

link

hervature 1807 days ago

This is the problem when people use technical terms loosely and interchangeably with their English definitions. Generative model classifiers are precisely as you describe. They model a joint distribution that one can sample.

GANs cannot even fit this definition because it is not a classifier. It is composed of a generator and a discriminator. The discriminator is a discriminative classifier. The generator is, well, a generator. It has nothing to do with generative model classifiers. Then you get some variation of neural network generator > model that generates > generative model. This leads to confusion.

link

nl 1807 days ago

I find https://openai.com/blog/generative-models/ pretty good on this. Reading from "More general formulation" we see:

Now, our model also describes a distribution p^θ(x)\hat{p}_{\theta}(x)p^ θ (x) (green) that is defined implicitly by taking points from a unit Gaussian distribution (red) and mapping them through a (deterministic) neural network — our generative model (yellow). Our network is a function with parameters θ\thetaθ, and tweaking these parameters will tweak the generated distribution of images. Our goal then is to find parameters θ\thetaθ that produce a distribution that closely matches the true data distribution (for example, by having a small KL divergence loss). Therefore, you can imagine the green distribution starting out random and then the training process iteratively changing the parameters θ\thetaθ to stretch and squeeze it to better match the blue distribution.

This is precisely a generative model in the probabilistic sense. The section on VAEs spells this out even more explicitly:

For example, Variational Autoencoders allow us to perform both learning and efficient Bayesian inference in sophisticated probabilistic graphical models with latent variables (e.g. see DRAW, or Attend Infer Repeat for hints of recent relatively complex models).

The issue with GANs is that - while they model the joint probability of the input space - they aren't (easily) inspectable in the sense you can't get any understanding of how inputs relate to outputs. This means they appear different to traditional generative models where this is usually a goal.

link

_delirium 1807 days ago

For people who want a more stats-grounded approach, VAEs are more or less state of the art these days: https://en.wikipedia.org/wiki/Variational_autoencoder

They are reasonably competitive with GANs. I haven't kept up on the latest models on either side, but VAEs have historically tended to be a little blurrier than GANs.

link

317070 1805 days ago

I think VAE's haven't been the state of the art since around 2016-2017? They have been squeezed from both directions, autoregressive models on the compression side, GAN's on the generation side.

They are still fairly competitive on both sides though.

link

_delirium 1805 days ago

Yeah, I guess I was thinking of VQVAE as a state-of-the-art example, but it was indeed 2017. Time flies! It's still pretty influential on newer systems though, e.g. OpenAI's DALL-E that made waves earlier this year has a VAE component (in addition to a Transformer component).

link