| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by streptomycin 4578 days ago
	Is there more description of the data anywhere? Like what does having an "f_number_of_pets" of 7.5 mean?

1 comments

mkwng 4578 days ago

I just noticed in the FAQ it states, "...several fields have been renamed of course." If I'm understanding this correctly, any real-world conclusions you draw will be completely meaningless, as we're essentially working from a mislabeled dataset.

link

ergest 4578 days ago

Not necessarily. They might as well be named attribute_1, attribute_2....attribute_n. ML algorithms don't care about the meaning of the features.

link

JFoss117 4578 days ago

That's true, but to have the best chance of designing a good method/analysis, I need to know what the variables in my analysis mean. Otherwise, it is tougher to make decisions about what variables it makes sense to include in a model, what sorts of transformations make sense, what sort of approaches might work best, etc.

link

idm 4578 days ago

I would echo this sentiment. Not only are the columns intentionally mis-labeled but they also appear to be computed, meaning some of the variance inherent to the original sample will have been lost.

link