|
|
|
|
|
by radarsat1
823 days ago
|
|
Adding scaled unit gaussian noise to the logits
noise = torch.randn_like(logits)*F.softplus(noise_logits)
noisy_logits = logits + noise
Question, if you changed this Gaussian normal for Gumbel noise you would get something like Gumbel softmax, right? I'm curious why not use it? Isn't it a usual way to implement differentiable discrete selection? My curiosity is about the effectiveness of Gumbel softmax since I have had some trouble using it in practice so I'm curious why it's not used here and if there are downsides to it compared to other methods. Honestly just adding normal noise like this seems simpler anyway. |
|