Hacker News new | ask | show | jobs
by maxk42 4578 days ago
I'll be the first to say it: Your data is either incorrect, arbitrary, or we're missing some information here.

Why does everyone have "7.5" - 8 siblings and 7.5 - 8 "weekly workouts" and 7.5 - 8 platinum albums?

2 comments

Specifically, you should explain all the columns, including:

- Is that the person's height in inches?

- What does the asterisk in certain column-names indicate?

- Why do the pets, platinum_albums, weekly_workouts, number_of_siblings and pokemon_collected values seem to fall in the range of 7 - 8?

Also, this dataset is far too small. There is a single male-male relationship and that's not going to provide any significant data if we're looking at genders at all.

I would also argue that it's not the best set of metrics to use to determine whether people will become friends. Age and facebook_friends_count might give you some hints, but I seriously doubt that shoe size has as big an impact on the potential for friendship as, say, common interests, shared culture, income class, or other socioeconomic factors.

The headers with asterisks are intentionally mislabeled. Updated this to be more clear in the README.
You write in the README that the mislabeled columns are "from our internal ratings". Can you give any more definite sense of what this means? What kind of things are these ratings based off of? What are they designed to reflect? How are they computed (roughly)?