Hacker News new | ask | show | jobs
by ZoF 3778 days ago
This implies there aren't future episodes upon which this type of statistical analysis could be applied.

This also strongly implies you think the author is a 'budding data scientist' out of his/her league.

This is very much a 'sample' given the context that South Park is still releasing new episodes.

FYI all elitist 'statisticians' ...

1 comments

If one is trying to figure out what characters will say in future episodes based on their speech in previous episodes, then you are in a prediction context, not significance testing context.

As far as I can tell, there are a lot of people out of their leagues going around with the title "data scientist".

This is not a sample. This is a census at this point in time. The fact that there will be another population tomorrow does not change the fact that you have the entire population of all words spoken by all characters up to today.

I am not a statistician. I am an economist who knows enough about statistics and econometrics to know when a significance test is applicable.

Also, do note the issue that R's csv parsing is going to mis-attribute some characters' speech to others. GIGO speaks loud.

You're the worst kind of intelligent person tbh.

Why be a nitpicking pedant when it is clear this is intended as a throwaway exercise whose only application is predictive...?

You're the one calling people "data scientists", OP didn't even use the word "science" anywhere in the article.