Hacker News new | ask | show | jobs
by HCIdivision17 3424 days ago
Awesome. Note that this isn't merely a fine, but also comes with the stipulation that they "prominently disclose and obtain affirmative express consent for its data collection and sharing practices, and prohibits misrepresentations about the privacy, security, or confidentiality of consumer information they collect". And they need to destroy the data collected before March, last year.

That's pretty good! At the very least, this will make it so people are more aware of the constant telemetry. Some find that sort of feature useful, and others find it chilling, but at least this is a step in the direction of making it obvious.

1 comments

Sure, they can destroy their data, but everything shared with those 3rd parties is still at those third parties. It's not exactly accomplishing much.
> Sure, they can destroy their data, but everything shared with those 3rd parties is still at those third parties.

Agreed.

I think it would be more effective deterrent if VIZIO had to work with those third parties to locate and delete the material that was transferred to them. This 1) would nullify the contract between VIZIO and those parties forcing VIZIO into back payment and 2) create an annoyance for the third-parties, hopefully making them think to ask how any data they're purchasing is being collected.

At a minimum I think that customers whose data was collected prior to March 2016 have a right to know which third-party companies purchased their information.

In California, there is the "Shine the Light" law [0] that requires a company to release third-party information to a consumer if there is identifiable information given to third-parties along with the data collected. So in this case, Vizio would be required (at least to California natives) to release those third-parties' names and associated data collected from you. [0] http://leginfo.legislature.ca.gov/faces/codes_displaySection....
Thanks for the additional information.

This is a step in the right direction but it's unfortunate that the obligation to disclose appears to be opt-in and not opt-out as detailed in paragraph (a).

> that business shall, after the receipt of a written or electronic mail request, or, if the business chooses to receive requests by toll-free telephone or facsimile numbers, a telephone or facsimile request from the customer, provide all of the following information to the customer free of charge

To compile a list of all companies that have their personal information an user would have to identify every business they have a business relationship with that could possibly be gathering this information and then send a written request to each on a regular basis as the request is only valid for information disclosed in the proceeding year. It seems then that this law only really covers consumers in the event that they find one specific and recent instance where they'd like this information disclosed.

Correction:

> To compile a list of all companies that have their personal information an user would have to identify every business they have a business relationship with that could possibly be gathering this information and then send a written request to each on a regular basis as the request is only valid for information disclosed in the preceding year.

As one of those customers, I definitely think so!
This may or may not help you, but it's typical in these sort of data deals to

(1) mostly sell aggregate data (eg these demographics actually watch these shows / actually saw these commercials). You'd probably be more interested in the latter in order to connect commercials with purchasing habits, but you're going to operate at the zip code or grocery store level.

(2) If you are selling individual records, make up identifiers and not tie to IP addresses. Both because of privacy concerns and to force your ad-vendor customers to continue to purchase the dataset.

(3) from the perspective of someone in the ad industry, I don't buy 11m cookies for ad targeting. These data deals require custom programming on both sides, time from bizdev at both vizio and ad companies, and for ad-company sales to be instructed and helped to sell to their customers. So unless Vizio tv viewing data has pretty high reach, I'm just not interested. I can't really see someone interested in 11m cookies unless that data is integrated with all available tv viewing data from Vizio, Netflix, Samsung, set-top boxes, etc. I'm aware of some pieces of that being sold, but not all of them.

(3a) also, from the in-industry perspective, household data is often not that helpful. You're going to get demographics, if you get them, from the person in that household that happens to pay the bills. That's often unrelated to the person that spends time consuming media. So if eg parent X pays the bills in that household, but kids or parent Y spend the most time watching tv, this data is nowhere near as helpful for ad targeting as you would think.

So, I worked there along with the others who are all HN regulars. I cant comment on this fine at all, but I can comment on the reaction:

The system for identifying an individual via their digital habits is advanced and (in internet terms) ancient.

The credit card industry, for example, is way more an invasion of your privacy than what are effectively Neilsen Ratings on steroids... so I think people over react to this.

The fact is, that if you look at netflix, they have way more specific viewing habit info than any random TV which can state what it is watching. They already have their customer info, demographics, if they have kids, if they have account leechers like a brother or a friend who maintains a profile. They can see what IP/Device/app install anything is coming from -- and they have agreements with various device manufacturers to NOT track their (Netflix's) viewership/app use etc...

Netflix is probably the most savvy digital media company at this point.

While this data will enrich various entities over time at the expense of 100% privacy as to the content one is viewing, I would state that one would be better served to be worried about their chrome and credit card history than the viewing of particular TV shows.

Additionally - having a very intimate knowledge of how the vizio system works, I would not be concerned about this at all in the scheme of things as truly, its literally impossible to have a system watching all media streams on TVs throughout the world.

Finally, Vizio has done a stand-up job of enforcing opt-in/opt-out in the actual firmware of every set.

While Netflix does have a lot of data about what you watch on Netflix, I believe that the reason people react so strongly to things like this is that, as the platform provider, using automated content recognition and other techniques, Vizio (or Samsung, ...) can know everything that you watch across all sources flowing through the TV. Even including things such as YouTube, linear broadcast TV, etc. That's a lot broader surface area than Netflix has...
This is not feasibly accurate currently.

There are contractual stipulations, by companies such as netflix, for example, that preclude image sensing on screens. (Netflix doesnt want anyone else having their viewer data as one logical argument) -- that doesnt mean that Netflix doesnt share viewer data with other third parties... [I have no idea if they do, I havent read their policy]

but here is the issue that 99% of people fail to get: The TV can only ID what it is that you are watching if the system has also been watching the same video/seen the same video/is also watching the same in real-time as you watch it.

So, yeah, it is impossible to ID any and all.

Further, the agreements between companies like vizio and others are very specific as to what is legal and allowed.

Having been-there -- These guys are on the up-and-up and while we all want to have the right to do anything we want in secret, there is nothing to panic about. However, there is a larger question that is raised regarding privacy; We already have laws around PCI/PII/Med data -- media consumption data is an open issue; How much behavioral data do you think Facebook has? "Show me the total count of males in Brazil between the age of 18-24 that identifies as single and lives within 50 miles of Rio who liked [object] where name begins with the letter 'R'" -- Yeah, I wouldnt worry about what TV Show a Vizio TV reported as displaying.

The FB example shows that you were at your machine, and clicked on the [object] etc...

Vizio(and all other brands) TVs are running in kiosk/unattended mode all over the place. How many screens in every sports bar were on last night? Well, they can certainly ID the # of TVs that were watching the Superbowl, but there are likely >~1 person at each screen. So, the worry about your demographics is meaningless in this case. Same as an election/election-debate.

But, as an aggregate you can see where the attention of the millions of TVs are pointed.

Like I said - it is simply neilsen ratings, but much much more accurate.

Like I said - it is simply neilsen ratings, but much much more accurate.

You keep saying that. But if I'm not mistaken, Nielsen families are paid for their participation. When can I expect a check from Vizio?

Thats why it is opt-in.... just like every other form of in-line marketing.

Are you expecting a check from google for your use of Gmail? Whats more invasive, Google reading your emails to mom about your colonoscopy, or the fact that Vizio knows that your TV watched the superbowl last night?

The TV can only ID what it is that you are watching if the system has also been watching the same video/seen the same video/is also watching the same in real-time as you watch it.

Aren't there digital watermarks on all broadcast TV shows and advertisements? If not, if there's a fingerprinting algorithm that can run on a screen's hardware, or filename matching for USB, media could still be identified. No need for the rest of "the system" to have video files in advance, or at all.

Sort of - but not quite.

Think of it like this; people are concerned that the system can do Who, What, When, Where, Why, How, How-Much, Who-do-they-know,... etc,...

It cant. Surely things can be inferred... but nothing that should get you riled up any-more-so than any other online service you have ever used. Plus - the opt-out functions actually work.

> There are contractual stipulations, by companies such as netflix, for example, that preclude image sensing on screens.

So you're saying Vizio and Netflix have a contract such that Vizio TVs will not report to Vizio about what is being displayed on the screen if its a Netflix stream? That sounds dubious. Maybe they could have a built-in Netflix app ignore such content, but what about Netflix streamed from a separate device via HDMI?

Yes.

but you have to think about the economics of the ingest side... how much does it cost to, as a client, ingest every single netflix show. Not going to happen. Plus it violates lots of various companies TOS.

This is a non-issue, IMO, and people shouldnt worry about it to the same extent that one should worry about FB and GOOG and AAPL's abilities...

This is a scape-goat.

That and they weren't exactly upfront about their collection.
Can you expand a little on how the article said the identification worked? It says it takes a set of pixels, does this patch of pixels get stored? How big is it? If it's stored, how is it protected? Is it encrypted at rest?

I have a Visio TV, I use it as a computer monitor. There's every possibility that PII or plaintext credentials might have been transmitted as part of this collection scheme. What did you do to mitigate that danger?

no pii is detected or ever "seen" by the system.

The way the system works is that there are a series of patches on the screen, and the RGB values of the collection of patches is captured and creates a fingerprint of what is being displayed on the screen.

This fingerprint is sent to the detection engine that has a DB of all the screens that were ingested into the content DB.

The system simply looks up the fingerprint value against the vast DB to see if it was something that was ingested.

The only thing ingested are broadcast television shows. no netflix youtube etc...

So anytime you use the screen as a monitor, or a kiosk, or a security camera display - anything other than an actual television - the system will not recognize that you're watch Ellen at 4pm PST and are currently 10 minutes into the show.

Thats all it does.

The goal was to have overlay events that allow for interactivity if a certain show or commercial is shown. That system didnt really make it too far in production.

Finally, if youre using a TV as a monitor - the system will see that they have never detected anything from that particular TV and it will simply ignore it. At certain points all the TVs that had never detected any TV ACR were just turned off and told to not talk to the system at all.

There really is nothing personal to worry about with this, IMO - and I am not "defending Vizio" -- I just know very intimately how the thing works as I helped build some of it, and I know that its not nearly as invasive as people think.

For example - there is a lot of foreign content on TV - spanish, chinese, filipino, indian, etc... none of this is ingested and never detected.

If Visio made complete backups (say full HDD clones) of all their computers on a weekly basis, do all of the backups need destroyed? Is it feasible to open every backup and delete the relevant information? Is it possible to "forget" that a backup process exists and still maintain the data?
That's unimportant. What is important is legal penalties for accessing or selling that data. No-one with assets and in-house counsel would dare violate an order like this.
I've always wondered whether a datastore built on an immutable architecture could be designed to cope with an expectation of receiving court orders to delete data. I think you'd arrive at a somewhat "DRM"-like design. That is:

1. the datastore system would be designed as an "appliance", intended to be installed directly on hardware, and would mandate (and check that) the hardware it was installed on provided both a TPM to store disk encryption keys in, and a full Secure Boot trust-chain granting only its bootloader boot privilege;

2. the datastore software would maintain a mutable index within the store (in the Merkle-tree-ref sense) of all data that is to be "considered deleted"—a master "tombstone" record, in the DBMS terminology—and would prevent anyone from accessing said data through the system's API.

With such a design, the data is effectively "gone", just as if it was really erased from the disks; the only way for a company running such a datastore to "recover" the data would be to find an exploit in the appliance allowing them to modify either the tombstone list (somewhat easy to thwart by choice of data structure), or the code that applies the tombstone policy.

In addition: Have per-object encryption keys and destroy those when data has to be wiped.
Have a look at what datomic does. http://docs.datomic.com/excision.html
It's like every service providers wet dream to arbitrarily lock up our private files. Currently only the russian cybercriminals are a bit ahead of the competition.