Hacker News new | ask | show | jobs
by shubhamjain 3576 days ago
I am not sure if "Big Data" is an accurate idea that fits Reliance's intentions to kick-start this. Surely, it sounds like a perfect recipe — onboard millions of customers and sell their 'data' — but on a secondary inspection it doesn't seems to be a smart idea.

Can you even imagine the scale needed to process this kind of data? That's petabytes (rough estimate), every day. Maybe, it's theoretically possible but any investment in this kind of technology would be enormous.

Browsing behaviour data maybe valuable but to what scale? Even a big advertising firm would balk at spending any big bucks for this and remember, the scale needed to mine any information out of this. How much valuable business insights can this generate that wasn't possible in the past?

Maybe I am wrong but I'd be very skeptical if selling is their master plan.

3 comments

speaking from experience with dpi. We did a POC project on analysing dpi data with hadoop,spark and other big data data tech.

You are right about the volumes, but wrong about it being impractical.

The volumes with a relatively small opco:

- +-7m subs

- 250gb just for the protocol classification. *

- Then you also have url logs etc

Key factors that reduce the costs and investment:

- commodity hardware (with hadoop etc)

- distributed

- query patterns

- you do not need to store every single record. The data can be aggregated up to hourly, daily, monthly the older it is

This is what we did, data was aggregated which significantly reduced storage.

Tested various options: Hive, hbase, druid

Edit: * per day

Scale is not at all a problem here, but i am not sure how they plan to pull this off considering so many privacy violations.

Also, may be some people in India won't mind accepting this if Reliance gives LTE internet at very low cost, others might not just use it i guess.