Hacker News new | ask | show | jobs
by coffee_am 1411 days ago
One way is to use categorical set splits [1] (proposed for categorical set inputs, but works for categorical features as well), used in TF-DF [1]. Greedy, and expensive to train (cheap inference though), but it gives great results.

[1] https://arxiv.org/pdf/2009.09991.pdf [2] https://www.tensorflow.org/decision_forests/text_features