Hacker News new | ask | show | jobs
by stewbrew 3778 days ago
How would you represent categorical data then? R's primary use case isn't text processing. And HW isn't always right.
1 comments

As character, for instance (in particular, they can do everything factors can do when used in conjunction with `unique`, and sorted factors can be represented as a conjunction of characters and numerics). Factors work better, but only barely. In particular, they are nowadays not any more efficient than using character (!). They used to be, which is why they are liberally used everywhere in R’s base libraries.
"In particular, they are nowadays not any more efficient than using character"

How could a comparison of two strings of unknown size be as efficient as comparing two integers? I'm curious to learn something new.

R uses a global string cache so any string comparison is just comparing two pointers.