| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by threeseed 4060 days ago
	Possibly. But just as likely they are running their own Hadoop platform and have hired some data scientists. Most large organisations these days are running their own analytics platform.

2 comments

ipsin 4060 days ago

Does analyzing this actually require joining in large data sets -- that is, larger than will fit on a single machine?

I'd always assumed that the records involved weren't very large, but I don't know much about the problem space, so I'm not sure if other data gets joined in in a way that benefits from cluster-based analysis.

link

jhorey 4059 days ago

I forget the exact numbers, but a single year's worth of Medicare part D claims data will be on the order 1TB. That doesn't include the beneficiary and provider datasets (which links patients and doctors) which you'll need to join against. Also when detecting fraud like this, you may want to include the other Medicare parts (A, B, C) which are oftentimes larger than part D (being that D is the newest). So this leaves you manipulating on the order of 10TB for single year analysis. Finally, since Medicare bills can be corrected up to 3 years, you may end up joining multi-terabyte datasets.

link

navait 4059 days ago

Medicare is one of the largest health schemes in the world in an industry known for massive amounts of paperwork. It's a humongous data set.

link

jhorey 4059 days ago

Yes, plus some sort of analytics-focused data warehouse (Teradata, etc.). I am almost certain though their analytics team is outsourced to a major contractor like Lockheed Martin.

link