Hacker News new | ask | show | jobs
by avs733 2456 days ago
Simple...data representation is not the same as data meaning.

I teach an introductory stats course and we hammer this in. Categorical data are often represented as numbers or other short indicators for storage purposes. Typically I fmultiple choice the encoding is by the order of the choice options.

I not infrequently see average of gender because male = 0 and female = 1 or vice versa and someone generates a table without thinking.

2 comments

The bigger issue here seems to be the use of ordinals in the data collection process. For instance, a lot of my CSVs don't have them and R and pandas are perfectly capable of enumerating. Why do you even need to put ordinals in the dataset? Does excel want this sort of thing or something?
Your don't need to people just do... I've kinda always assumed or connects to olden days and efficient storage and memory use. Male is a four character string, 1 is an integer.
Aha! Now we see why there are so many transgendered people these days! :):):)