|
>A R factor is a sequence type much like a character atomic vector except that the values of the factor are constrained to a set of string values, called “levels”. For example, if you have a table of measurements of some widgets and each row corresponds to a single measurement of a single widget, you could have a factor-typed column called measurement.type containing the values “length”, “width”, “height”, “weight”, and “hue”, with the corresponding numeric measurements stored in a “value” column. This is a very bad example of what factors are for in R, because it makes it seem like factors are for defining variables or keys in key value pairs. You can use them for that, but it isn't the intended use. A better example would be: suppose you were comparing the amount of sugar in fruits based on several growing locations, and you had three columns: | Fruit | Location | Density (g/L) | Fruit would be a factor variable (let's say it takes the possibilities of apple, banana, orange), and location could be too, if it were a discrete set of possibilities (as opposed to lat/lon coords) This author seems to forget that R was built for working with data in an analytical setting, unlike all of the languages he's comparing it to. It has creeped into other areas, but that seems to be because in the hands of a skilled user it is far easier to implement a data analysis solution. I'm sure someone will come in and say how much better pandas is, but on the small datasets, I'll stick with R, especially with how brittle and buggy matplotlib is. |
That is the approach for tidy data, which is used a lot in the R tidyverse (http://tidyr.tidyverse.org/articles/tidy-data.html)