My point is that it shows up everywhere, just in different forms. Sparse coding has a penalty for large basises. Gaussian process regression tunes the density of its representation using Bayesian model Selection. SVMs have a slack parameter which dictates how many errors you'll tolerate to reduce the number of hyperplanes.