Hacker News new | ask | show | jobs
by ausvisaissues 3585 days ago
I am not completely following what you mean by "98% is of an unknown class, but pulled from a known distribution".

Are you suggesting that: 2 percent of samples are positive, drawn from p(x|y=1)

98 percent of samples are drawn from a distribution p(x), but may be either positive or negative?

The setting that you described above is called "positive and unlabeled (PU)" learning. This paper: http://cseweb.ucsd.edu/~elkan/posonly.pdf is one of the seminal articles on the topic (although the equation on the bottom of page 214 contains a statement that may not necessarily hold true). There are quite a lot more recent papers on this topic.