|
|
|
|
|
by reader5000
5756 days ago
|
|
Eh their analysis method is not too hot. From the comments section: "The phrases included in the black boxes are the top 50 phrases most statistically correlated to that group. We calculated this as follows: 1. We calculated the frequency of every 1, 2, and 3 word phrase for the whole population.
2. We calculated those same frequencies within each race/gender pair.
3. For each phrase, we divided #2 by #1.
4. This is the propensity of a given group to use a given phrase.
5. The list you see is the phrases with the 50 highest ratios of #2/#1." So even if a group uses a phrase 1.001x more than the population average, it might still be listed, if there are no actual phrase-usage differences (i.e., all phrase ratios will be small, and the top 50 will be arbitrary). |
|
Fortunately, we can perform a sanity check: read some of the phrases to someone, and ask that person which group they think the phrases came from. I bet people will guess with high enough accuracy to establish that it's nonrandom.