Hacker News new | ask | show | jobs
by rsfern 1219 days ago
For what it’s worth, I usually default to swish activations, which seem to be popular in my corner of graph neural nets (materials and chemistry). Performance is about the same as ReLU, and I like swish because it doesn’t have a hard discontinuity.