Hacker News new | ask | show | jobs
by nanis 3778 days ago
If one is trying to figure out what characters will say in future episodes based on their speech in previous episodes, then you are in a prediction context, not significance testing context.

As far as I can tell, there are a lot of people out of their leagues going around with the title "data scientist".

This is not a sample. This is a census at this point in time. The fact that there will be another population tomorrow does not change the fact that you have the entire population of all words spoken by all characters up to today.

I am not a statistician. I am an economist who knows enough about statistics and econometrics to know when a significance test is applicable.

Also, do note the issue that R's csv parsing is going to mis-attribute some characters' speech to others. GIGO speaks loud.

2 comments

You're the worst kind of intelligent person tbh.

Why be a nitpicking pedant when it is clear this is intended as a throwaway exercise whose only application is predictive...?

You're the one calling people "data scientists", OP didn't even use the word "science" anywhere in the article.