| HN Mirror

Say you want to represent a categorical feature that has many levels. You're likely to one-hot encode it. This can lead to a lack of parity between how sparsely you treat your continuous features versus your categorical ones, where continuous features can't be partially included, but one-hot-encoded categoricals can. Sometimes you want to be sparse on your original variables, rather than sparse on dimensions of their encoding. It's a good inductive bias that if, for example, profession is relevant to your model, then all professions are relevant (and you still get to ridge penalize the individual professions). Nice for interpretability too.