| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nottombrown 4578 days ago
	Hey HN, Grouper founder here. Let me know if you have any questions about the contest.

7 comments

ddod 4578 days ago

This is the sort of thing I'm personally very interested in, and I have some pretty novel ideas for how I'd approach it. That said, I wouldn't participate in this because it clearly devalues the industry. You should really rethink your approach.

Developers who are considering participation in this, I'd suggest you build something for yourself with data acquired elsewhere.

link

libria 4578 days ago

> I wouldn't participate in this because it clearly devalues the industry.

People this may be aimed at:

* Experienced devs in boring day-jobs who are seeking some kind of off-time challenge.

* People just getting into ML and want to solve something real.

* CS students with spare time.

You know more about ML than me, but it doesn't sound like they're looking for a cancer cure; just fishing around for a one-off challenge. Or maybe they're taking names for future interview candidates.

> Developers who are considering participation in this, I'd suggest you build something for yourself with data acquired elsewhere.

Relax, dude. If people think this an interesting problem to solve, what's that to you?

link

jameszhang 4578 days ago

Honestly, I think this is a very cool challenge. As someone who just went on a Grouper last night in Boston and had a great time, I think I just might participate and submit something. Do you have any limitations on how many people can form a team? Personally, I would pair on this with my roommate. He's the big data guy, and I'm the coder.

link

nottombrown 4578 days ago

Feel free to work together as a team. Glad that you guys had fun last night :D

link

JFoss117 4578 days ago

A few questions about the data:

1. How is it collected? From a survey, or grabbed from user FB profiles?

2. What is the platinum albums variable? Maybe the number of platinum albums that the user likes on FB??

3. Why are there some "male" entries in the f_gender column, and some "female" entries in the m_gender column?

link

nottombrown 4578 days ago

1. The data is collected from the user's FB profile or comes from our internal ratings 2. The platinum_albums header is just a joke, we anonymized the data 3. Thanks for pointing that out. There was a bug with a few rows that is now fixed.

link

yankoff 4578 days ago

Why you guys didn't want to run this competition on Kaggle? That could get it more attention from data scientists.

link

streptomycin 4578 days ago

Is there more description of the data anywhere? Like what does having an "f_number_of_pets" of 7.5 mean?

link

mkwng 4578 days ago

I just noticed in the FAQ it states, "...several fields have been renamed of course." If I'm understanding this correctly, any real-world conclusions you draw will be completely meaningless, as we're essentially working from a mislabeled dataset.

link

ergest 4578 days ago

Not necessarily. They might as well be named attribute_1, attribute_2....attribute_n. ML algorithms don't care about the meaning of the features.

link

JFoss117 4578 days ago

That's true, but to have the best chance of designing a good method/analysis, I need to know what the variables in my analysis mean. Otherwise, it is tougher to make decisions about what variables it makes sense to include in a model, what sorts of transformations make sense, what sort of approaches might work best, etc.

link

idm 4578 days ago

I would echo this sentiment. Not only are the columns intentionally mis-labeled but they also appear to be computed, meaning some of the variance inherent to the original sample will have been lost.

link

murtali 4578 days ago

How is the submitted code used post contest?

link

chegra 4578 days ago

Where do you submit your results?

link