Hacker News new | ask | show | jobs
by ladon86 1177 days ago
Having every column as a boolean (0/1) means you can treat it as a bitmap. As an (entirely fictional) example, imagine if you wanted to get the features of a thread instead of a single tweet. You could do it as a union of all the tweets:

threadFeatures = tweet1 | tweet2 | tweet2

1 comments

Ok that makes lots of sense from an engineering perspective. It's pretty insane from a statistical perspective though, which I think was the original point.
> It's pretty insane from a statistical perspective though

Efficiency is way more of a concern, at this scale, than the more trivial was of trying to find a competent person that wouldn't misuse the values.