Hacker News new | ask | show | jobs
by bigred100 2398 days ago
Are the results different from just doing something like L1 regularization?
1 comments

I've always wondered this and can never find a satisfactory answer online. You'd think that if ReLU works then LASSO shouldn't present any problems either right?

In addition L2 tends to discourage sparsity by spreading out the influence of weights, which seems antithetical to the mission of pruning. (for example, if you run a ridge regression with two identical features the L2 penalty will assign equal coefficients to both instead of zeroing one out like L1 does)