Hacker News new | ask | show | jobs
by shawntan 766 days ago
There's a general trap people working on deep learning tend to fall into, thinking "Why don't we learn the activation function as well?"

The answer to that really should be that a combination of linear and non-linear activations can learn you the non-linearities you need. https://twitter.com/bozavlado/status/1787376558484709691

Though there are other types of functions that these "universally approximate" formulations don't extrapolate well to, and solutions to that might actually be an improvement. (think: sin,cos)