|
|
|
|
|
by ScoutOrgo
2570 days ago
|
|
I think it is actually preferable to start by converting categorical variables to numeric most of the time, even if they are not ordinal. The RF algo can separate off individual classes with 2 splits (e.g. <=7 then >=7) if a single class is very important. The "pool" of features for RF sampling also doesn't get diluted with one hot encoded classes from the one feature. I am pretty sure I've seen this done successfully in kaggle a bunch before, but don't have any sources on hand for evidence that this method is "better". It does however make it much easier to just throw the data into the RF and check the feature importances to see which features are helping the most. |
|