|
|
|
|
|
by nartz
1857 days ago
|
|
There may be some high-level validity to using gini splitting as a means for 'general' prioritisation to better understand the data; however, second order effects (e.g. assumptions of non-independence between variables) can often dominate, in which case multi-dimensional clusters tend to be a better mental model for logical groupings. The simplest is like a k-means, or other 'embedding' or latent based models calculated from matrix factorizations (CF, PCA, etc) which seek to summarize data into 'topics' or 'categories'. |
|