Hacker News new | ask | show | jobs
by 0xmohit 3581 days ago
Quoting:

  Privacy: An unnamed Jio executive mentioned “deep packet
  inspection” to Reuters, saying: “It’s called deep packet
  inspection, and what you can do with the analytics of that is
  mind-boggling,” he said, referring to a practice that digs into
  “packets” of data created by computers for efficiency, mining
  them for information. If this is happening and Jio is accessing
  data packets to develop patterns of user data consumption, this
  is a major privacy violation. The company deserves to be taken
  to court for this, as much as the India needs a privacy law.
This essentially implies that they would earn more revenues by analyzing one's browsing behavior, performing analytics and selling the data. Awesome.

Welcome to India!

3 comments

I am not sure if "Big Data" is an accurate idea that fits Reliance's intentions to kick-start this. Surely, it sounds like a perfect recipe — onboard millions of customers and sell their 'data' — but on a secondary inspection it doesn't seems to be a smart idea.

Can you even imagine the scale needed to process this kind of data? That's petabytes (rough estimate), every day. Maybe, it's theoretically possible but any investment in this kind of technology would be enormous.

Browsing behaviour data maybe valuable but to what scale? Even a big advertising firm would balk at spending any big bucks for this and remember, the scale needed to mine any information out of this. How much valuable business insights can this generate that wasn't possible in the past?

Maybe I am wrong but I'd be very skeptical if selling is their master plan.

speaking from experience with dpi. We did a POC project on analysing dpi data with hadoop,spark and other big data data tech.

You are right about the volumes, but wrong about it being impractical.

The volumes with a relatively small opco:

- +-7m subs

- 250gb just for the protocol classification. *

- Then you also have url logs etc

Key factors that reduce the costs and investment:

- commodity hardware (with hadoop etc)

- distributed

- query patterns

- you do not need to store every single record. The data can be aggregated up to hourly, daily, monthly the older it is

This is what we did, data was aggregated which significantly reduced storage.

Tested various options: Hive, hbase, druid

Edit: * per day

Scale is not at all a problem here, but i am not sure how they plan to pull this off considering so many privacy violations.

Also, may be some people in India won't mind accepting this if Reliance gives LTE internet at very low cost, others might not just use it i guess.

> This essentially implies that they would earn more revenues by analyzing one's browsing behavior, performing analytics and selling the data. Awesome. Welcome to India!

It's not like this is a uniquely Indian problem. The largest wireless provider in the US (Verizon) was caught doing basically exactly this, and other (broadband) telecom providers are trying to do the same.

Taking to court :) :) ... forget it...