Hacker News new | ask | show | jobs
by otabdeveloper4 828 days ago
Answers to these questions are actually Bayesian statistical models ("what is the probability of Y given a high likelihood of X"), treating these problems as unsupervised classification might work, but that's a very crude way of approaching them.
1 comments

I wouldn't say it's a crude way of approaching the problem, it's a crude way of solving the problem. Taking the fraud example, taking unsupervised approaches to understanding patterns of the data before you impose assumptions on the data is a very useful process. For example, what might be fraudulent behaviors in the first place, assuming you aren't even sure you know what fraud looks like, or that it's actually all been detected? Your goal there might be to detect latent features period, not look at their predictive power for X.

Having understood that question, and built an understanding of what predicts fraud, you would then graduate to build models to understand the extent to which features predict fraudulence.

My point in context of the conversation is that it's useful in a business context to explore and understand that data.