Hacker News new | ask | show | jobs
by cyorir 2351 days ago
Gradient descent should have no issues with a function like exp(-x^2). Actually, softmax (softmax(x_i) = exp(x_i)/sum_j(exp(x_j))) is sometimes used as an activation function. It could make sense to modify the softmax function to use -x^2 in place of x, for some use case. However, it doesn't always make sense as a drop-in replacement for other activation functions like ReLu or Sigmoid. It really depends on your use case.