It is used to estimate uniform distribution inside the model during back-propagation.

During the training of VAE, along with the decoder in two different settings

  • Gaussian Distribution
    • The encoder predicts the gaussian distribution parameters
    • Sample from this gaussian distribution is drawn Z.
    • This Z is passed to the decoder to reconstruct the image
    • During Back-propagation, Z can be re-parameterised as Z =
    • This re-parameterisation trick allows the gradients to flow back from decoder to the encoder.
  • Categorical Distribution
    • If we are using categorical distribution, and we have some codebook for which we the encoder create the probabilities.
    • Sample Z from the categorical distribution.
    • Since, we can’t write Z using values of probabilities of categorical distribution like re-parameterisation trick used in gaussian distribution case.
    • No gradients can back-propagate from decoder to the encoder.

To allow sampling with differentiation for categorical distribution case, Gumbel-max trick is used. But Gumbel-max still has arg-max, to allow gradients to flow-back through arg-max, arg-max is estimated using Softmax along with temperature parameter t, ie, softmax on [p1/t, p2/t, p3/t, …]

The temperature is slowly decreased to zero, and when temperature tends to zero, softmax behaves like arg-max.