Specifically, you should explain all the columns, including:
- Is that the person's height in inches?
- What does the asterisk in certain column-names indicate?
- Why do the pets, platinum_albums, weekly_workouts, number_of_siblings and pokemon_collected values seem to fall in the range of 7 - 8?
Also, this dataset is far too small. There is a single male-male relationship and that's not going to provide any significant data if we're looking at genders at all.
I would also argue that it's not the best set of metrics to use to determine whether people will become friends. Age and facebook_friends_count might give you some hints, but I seriously doubt that shoe size has as big an impact on the potential for friendship as, say, common interests, shared culture, income class, or other socioeconomic factors.
You write in the README that the mislabeled columns are "from our internal ratings". Can you give any more definite sense of what this means? What kind of things are these ratings based off of? What are they designed to reflect? How are they computed (roughly)?
- Is that the person's height in inches?
- What does the asterisk in certain column-names indicate?
- Why do the pets, platinum_albums, weekly_workouts, number_of_siblings and pokemon_collected values seem to fall in the range of 7 - 8?
Also, this dataset is far too small. There is a single male-male relationship and that's not going to provide any significant data if we're looking at genders at all.
I would also argue that it's not the best set of metrics to use to determine whether people will become friends. Age and facebook_friends_count might give you some hints, but I seriously doubt that shoe size has as big an impact on the potential for friendship as, say, common interests, shared culture, income class, or other socioeconomic factors.