Hacker News new | ask | show | jobs
by threeseed 4012 days ago
Possibly. But just as likely they are running their own Hadoop platform and have hired some data scientists.

Most large organisations these days are running their own analytics platform.

2 comments

Does analyzing this actually require joining in large data sets -- that is, larger than will fit on a single machine?

I'd always assumed that the records involved weren't very large, but I don't know much about the problem space, so I'm not sure if other data gets joined in in a way that benefits from cluster-based analysis.

I forget the exact numbers, but a single year's worth of Medicare part D claims data will be on the order 1TB. That doesn't include the beneficiary and provider datasets (which links patients and doctors) which you'll need to join against. Also when detecting fraud like this, you may want to include the other Medicare parts (A, B, C) which are oftentimes larger than part D (being that D is the newest). So this leaves you manipulating on the order of 10TB for single year analysis. Finally, since Medicare bills can be corrected up to 3 years, you may end up joining multi-terabyte datasets.
Medicare is one of the largest health schemes in the world in an industry known for massive amounts of paperwork. It's a humongous data set.
Yes, plus some sort of analytics-focused data warehouse (Teradata, etc.). I am almost certain though their analytics team is outsourced to a major contractor like Lockheed Martin.