Hacker News new | ask | show | jobs
by plorntus 1515 days ago
I feel like a lot of the comments here are written after only taking the test and many are not reading the rest of the article.

The authors of the website are stating that they believe the study is wrong. The below/above 60 answer is showing you it’s incorrect half of the time along with data backing up the claim.

4 comments

Yes, hilarious comments in this thread. Please at least skim the article.
The end of the article was hilarious.

> we decided to reduce our experiment to three tasks because of attention spans (not yours, it is exceptional if you are reading this).

But their data doesn't make sense to be personally...

Only 5% of their dataset is above the age of 60, making their claim that they are getting 50% of their guesses wrong seem like they are calculating it wrong. Surely their cut-off should be at the 95th percentile of the data?

They shouldn't be guessing 'under 60' the same proportion of times as 'over 60', because their population is mostly under 60.

Again though, they are arguing that there is no correlation between randomness and age. This was just a demonstration that when they use randomness to predict age, the results are wrong 50% of the time-- which is precisely in accordance with their hypothesis
Yeah but their guess shouldn't be wrong 50% of the time as again that means that they can’t have picked the 95th percentile result! Because it’s 50:50 I’ll assume that they are assigning people scoring higher than average the “under 60” category - which is obviously incorrect. Otherwise how do they pick the cut off?

To explain with another example - let's say that I have a dataset of 100 people's scores at golf (no handicaps) and I know that 5% of them are pro-players and others are 'advanced amateurs'. Because of this I might take the top 5 scores and guess that they are pro's and assign the others the guess of 'advanced amateur'.

Now let's say that there was actually no correlation between people's scores at golf and their 'pro' status - what accuracy would I expect in the above experiment? The answer is actually closer to 90% 'accurate guesses' than 50%! (Although obviously - that's 90% accurate based on random chance).

Now if someone told me they got 50% of the guesses wrong at this task, that implies that they guessed that the top 50% of those golfers were pro rather than picking the top 5% of scores, and I would question the methodology.

This % is similar to the dataset in the webpage - I downloaded it, filtered out exclusions and c4% of the valid responses are 60 or over.

If I inherently pick a small population (i.e. over 60's are c4% in this dataset) and I am guessing wrong 50% of the time, it means that my cut-off is incorrectly calibrated. Their score cut-off should, at worst, be picking the wrong 4% and missing another 4%.

Am I going crazy? It seems logical to me, but to be open maths isn't my strong point. I just know that if I designed the guessing rule, I would be getting more than 50% (my algorithm would be 'if the users average score across the three tests is less than -1.5, assign 'over 60' and that would get c95% accurate guesses, albeit it would still not prove anything and I agree with the authors overall premise!).

In your golf example, making that guess requires an additional knowledge of what "pro" means and it's frequency among golfers. The data doesn't know that just like the randomness data doesn't know that most humans are younger than 65 years old. If you really want to figure out how predictive the data is, you shouldn't include considerations like that in your model. I get what you're saying but ultimately I don't think their goal was to make the most accurate prediction, they wanted to make one that illustrated their point by basing their guess off the data alone.
The calculation involves knowing the age of the sample population though (if you don’t know the ages of your sample, how do you work out what the cut off is at 60 years?).

If I don’t know how many golfers are pro, I simply cannot estimate if it is 100 golfers that are pro or 0 (unless it’s a real gap in scores). Making an assumption that 50 are pro is no more valid than 0 or 100.

If you take the average score of 100 people and say that you estimate anyone scoring below the average is above 60, you are going to be wrong regardless of if your hypothesis is valid or not.

Putting that up and saying “see, it’s wrong 50% of the time!” doesn’t make sense when your calculation is incorrect.

In order to calculate the cut-off correctly they either need to take the 95th percentile result, or pick a sample where 50% of people are over-60 and 50% are under 60 and take an average of that.

Using a dataset where 95% of people are under 60 and then picking the mean clearly isn’t going to work.

Yeah, they would be far better off just guessing under 60 every time...
I lol'd at the "Trend line": https://imgur.com/a/ohYbcLL
I'd have read it if it weren't white text on a pink background. I'm not going through the trouble of pulling it up in a browser and undoing what they presumably did on purpose. Then to complain that people don't read the whole thing?