I've always wondered this and can never find a satisfactory answer online. You'd think that if ReLU works then LASSO shouldn't present any problems either right?
In addition L2 tends to discourage sparsity by spreading out the influence of weights, which seems antithetical to the mission of pruning. (for example, if you run a ridge regression with two identical features the L2 penalty will assign equal coefficients to both instead of zeroing one out like L1 does)
In addition L2 tends to discourage sparsity by spreading out the influence of weights, which seems antithetical to the mission of pruning. (for example, if you run a ridge regression with two identical features the L2 penalty will assign equal coefficients to both instead of zeroing one out like L1 does)