Hacker News new | ask | show | jobs
by vslira 47 days ago
Hm, that’s a multinomial classification with a very high cardinality. It’s really weird it works. I’m sure it does as the author states, but for how many authors (out of the whole web) does this work?
4 comments

It worked on me, and I would be shocked if my blog (dmd.3e.org) has more than a dozen readers. I am stunned.
It's not about the readers, just the fact that there's enough of a sample that it can use, with sufficient differentiation from other content.
I’ve posted on average 3 things a year.
There are ~8 billion people. Sounds big, but it's only 2^33. Ie if you can find 33 things about the text which halve the number of possible writers, you have narrowed it down to 1 person.

Just a couple more things and you can accommodate some of your things being mistaken/wrong/uncertain too.

Sure the cardinality is high, but the model isn't using a uniform prior. What do you suppose all the the values in each of the terms are, P(Text sample | Kelsey Piper) * P(Text sample) / P(Kelsey Piper)?
Maybe it just says all writing is Kelsey Piper.