| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by caspianm 3518 days ago

Say I have two categories, programming articles and non- programming articles, and some other data about each article. And I want to predict whether the article will be interesting or not. And I want to be fair to interesting non-programming articles by having the same proportion of false negatives to correct positives in the non-programming subset of articles as in the programming subset of articles.

Is there a technical term for that in statistics?

It's like trying to get a representative sample, but only representative in one specific way (topic), and deliberately non representative in another (interestingness)

I think this could get at one of the things people mean, and it might be interesting to see how this trades off against overall accuracy or representativeness in other categories.