| From the article: Although Decety’s paper had reported that they had controlled for country, they had accidentally not controlled for each country, but just treated it as a single continuous variable so that, for example “Canada” (coded as 2) was twice the “United States” (coded as 1). I mean I don't even understand how this seemed like a normal thing to do? |
This mistake would be downright trivial to make in R. Just declare that Country is a Factor (which is the built-in type for categorical variables), and then throw the data into a library whose attitude towards errors is to coerce everything to numbers until the warnings go away.
Background: Factors in R are the idiomatic way to work with categorical data, and they work somewhat like C-style enums except the variants come from the data rather than a declaration. So if you take a column of strings in a data frame and cast it to a Factor, it will generate a mapping where the first distinct value is coded as 1, the second distinct value is coded as 2, etc. Then it replaces the strings with their integer equivalents, and saves the mapping off to the side.
I forget the exact rules (if there are rules, R is a bit lawless), but it's not very hard to peek under the hood at the underlying numeric representation. Many built-in operations "know" that Factors are different (e.g. regressing against a Factor will create dummy variables for each variant), but it's up to each library author how 'clever' they want to be.