Hacker News new | ask | show | jobs
by singularity2001 3044 days ago
also there are 'leaking' ReLUs

f(x) = a if x<0 else b

usually 0 < a << b

2 comments

I actually think the idea of using leaky ReLUs is interesting, because it'll still provide a small gradient when x < 0, which perhaps may slightly alleviate the vanishing gradients issue
I'm aware. He's using ReLU in this case though.