Hacker News new | ask | show | jobs
by jim-greer 3599 days ago
Depending on the business needs, returning outliers can be useful even if there are a bunch of false positives.

I'm not a machine learning guy, but when I was at Kongregate, we had a problem with credit card fraud on our virtual goods platform. It wasn't serious fraudsters, just dipshit teens with their parent's credit card.

I had labeled data: historical transactions, with chargebacks, which I fed into Weka. I included all kinds of stuff we knew about the user. A simple rule-based classifier could pick out risky transactions, with a lot of false positives.

I made a simple tool for our customer service team to review these risky transactions. They would decide whether to warn the user, temporarily block them from buying or temp ban them, or permanently ban them.

This worked pretty well for us. The risk factors were new players, players spending quickly, and users who were dicks - as measured by how often others had muted them in chat, how often they swore in chat, etc.

As an aside, saying "fuck" or "shit" in chat wasn't very predictive of fraud - often those terms aren't signs of an abusive user, since they might just be saying "fuck, I suck at this game". What was predictive was users who said "Gay", "Penis", or "Rape". People who use those terms on a game platform are largely dickheads. So the score for abusiveness became known as the "Gay, Penis, Rape Score" or "GPR" for short.

1 comments

Very cool, thanks! I didn't realize that in certain contexts, many false positive outliers wouldn't necessarily be such a bad thing, especially when they could be further refined with human interaction.