Hacker News new | ask | show | jobs
by streptomycin 4578 days ago
Is there more description of the data anywhere? Like what does having an "f_number_of_pets" of 7.5 mean?
1 comments

I just noticed in the FAQ it states, "...several fields have been renamed of course." If I'm understanding this correctly, any real-world conclusions you draw will be completely meaningless, as we're essentially working from a mislabeled dataset.
Not necessarily. They might as well be named attribute_1, attribute_2....attribute_n. ML algorithms don't care about the meaning of the features.
That's true, but to have the best chance of designing a good method/analysis, I need to know what the variables in my analysis mean. Otherwise, it is tougher to make decisions about what variables it makes sense to include in a model, what sorts of transformations make sense, what sort of approaches might work best, etc.
I would echo this sentiment. Not only are the columns intentionally mis-labeled but they also appear to be computed, meaning some of the variance inherent to the original sample will have been lost.