|
|
|
|
|
by shubhamjain
3576 days ago
|
|
I am not sure if "Big Data" is an accurate idea that fits Reliance's intentions to kick-start this. Surely, it sounds like a perfect recipe — onboard millions of customers and sell their 'data' — but on a secondary inspection it doesn't seems to be a smart idea. Can you even imagine the scale needed to process this kind of data? That's petabytes (rough estimate), every day. Maybe, it's theoretically possible but any investment in this kind of technology would be enormous. Browsing behaviour data maybe valuable but to what scale? Even a big advertising firm would balk at spending any big bucks for this and remember, the scale needed to mine any information out of this. How much valuable business insights can this generate that wasn't possible in the past? Maybe I am wrong but I'd be very skeptical if selling is their master plan. |
|
You are right about the volumes, but wrong about it being impractical.
The volumes with a relatively small opco:
- +-7m subs
- 250gb just for the protocol classification. *
- Then you also have url logs etc
Key factors that reduce the costs and investment:
- commodity hardware (with hadoop etc)
- distributed
- query patterns
- you do not need to store every single record. The data can be aggregated up to hourly, daily, monthly the older it is
This is what we did, data was aggregated which significantly reduced storage.
Tested various options: Hive, hbase, druid
Edit: * per day