Hacker News new | ask | show | jobs
by yellowapple 2243 days ago
> Advertisement as implemented today is a privacy hazard, but there are other ways to do it, client-side, which is what Cliqz attempted.

https://en.wikipedia.org/wiki/Cliqz#Integration_with_Firefox: "According to the Firefox support website, this version of Firefox collects and sends data to the Cliqz corporation including text typed in the address bar, queries to other search engines, information about visited webpages and interactions with them including mouse movement, scrolling, and amount of time spent; and the user's interactions with the user interface of the Cliqz software. This data is tied to a unique identifier allowing Cliqz to track long-term performance."

Yep, real "client-side", eh?

Even if it was actually client-side, that's cold comfort; the data's still being collected and presumably persisted, and there's no telling whether or not some future software update will make that locally-stored data not-so-locally-stored anymore.

1 comments

This claim on the Wikipedia is factually incorrect: "This data is tied to a unique identifier allowing Cliqz to track long-term performance."

Thanks for noticing it, we will create an issue.

UUIDs only applies to telemetry, which is not the data being described in the paragraph: queries, scrolling, amount time spend, urls, etc. For this kind of user data (HumanWeb) there is no uuid, neither implicit or explicit.

There are plenty of papers on the topic, independent audits, the code is open-source and the data can be inspected. HumanWeb data is 100% record-unlikable, we have no way to know if two messages received come from the same person or not.

> This claim on the Wikipedia is factually incorrect: "This data is tied to a unique identifier allowing Cliqz to track long-term performance."

That claim comes directly from Mozilla's support page on the subject¹:

> Firefox shares the following data with Cliqz to provide functionality and improve performance of the Cliqz feature for everyone:

> - Search queries & webpage data: This includes text as you type in the address bar, queries you send to certain search engines, and data about the webpages you visit and interactions with those pages, such as mouse movements, scrolls, and time spent.

> - Interaction data: This includes your interactions with specific fields and buttons in the Cliqz feature. This data is tied to a unique identifier allowing Cliqz to understand performance over time.

So, if that's "factually incorrect", you should take it up with your business partners.

> There are plenty of papers on the topic, independent audits, the code is open-source and the data can be inspected. HumanWeb data is 100% record-unlikable, we have no way to know if two messages received come from the same person or not.

For now. Things can always change, and promises can always be broken. It'd be a lot easier to trust Cliqz if it wasn't collecting such data at all, let alone sending it to remote servers with a pinky promise that it's anonymized.

----

¹: https://support.mozilla.org/en-US/kb/cliqz-recommendations-f...

> we have no way to know if two messages received come from the same person or not.

This is accurate. They partnered with us at FoxyProxy to prevent browser telemetry from revealing users' IP addresses and other metadata.

These guys are above board and even if there may have been a problem in 2017 with Firefox, that was no longer the case in 2018, 2019, and 2020. They bent over backwards and jumped through many hoops to hide their users' identity. They were very interested in the solving the engineering problem around anonymization. I know this from first-hand experience.

This is a loss larger than many people realize. There are so few companies with such integrity and who put their users first, above profits or shareholders.

There were no problems in 2017 or before, we were doing the same exactly the same during Firefox times (we went through security and privacy audits). Data collection is and always was safe wrt to privacy.

Why the ruckus then? Because some assume that is data is sent, privacy is compromised, period. They do not know how to do it, and they assume it's impossible. Instead of checking the claims for themselves (code is public, data can be inspected, documentation, etc.) they prefer to stick to their belief system, which is more comfortable and does not imply hard work. The press release that FF -- written by one of these people with a lot of biases and published without review -- did not help as it was misleading.

We did a big mistake back then. Instead of rebutting it, we chose to ignore the FUD assuming that facts would prevail. They did not.

Sadly the community is "scared", we have been congratulated and lauded by anyone who checked our systems. But never endorsed in public, there is little to gain and a lot to lose (you are getting a sneak preview right now).

Sad story, extremely frustrating too, but there is nothing we can do now.

> Why the ruckus then? Because some assume that is data is sent, privacy is compromised, period. They do not know how to do it, and they assume it's impossible. Instead of checking the claims for themselves (code is public, data can be inspected, documentation, etc.) they prefer to stick to their belief system, which is more comfortable and does not imply hard work.

If my eyes rolled any harder I'd likely pull a muscle.

Let's dissect this a bit:

> Because some assume that is data is sent, privacy is compromised, period.

It ain't about it being sent (though that's bad, too). It's about it being collected at all. Cliqz collects and aggregates my data somewhere, and that is therefore a violation of my privacy, even if (for now) it's on my local machine (I could certainly routinely delete that collected data, much like I do with cache and cookies, but then what's the point of using Cliqz in the first place?).

> Instead of checking the claims for themselves (code is public, data can be inspected, documentation, etc.)

I have checked the claims for myself (to the best of my ability). None of them address the very real concern of the aggregated data being, you know, aggregated. Just because it's on my local machine doesn't mean it's guaranteed to stay that way; every second it's on my machine is a liability that anyone who's privacy-conscious would want to eliminate (and anyone who's not privacy-conscious doesn't care about).

Like, there's no argument that Cliqz's HumanWeb is at least less evil than traditional tracking systems, but it still relies on aggregation of data, and that is still a massive privacy hazard. Not to mention that the data that is sent¹ is still rich with datapoints that could be used for fingerprinting (the papers seem to suggest there are "heuristics" to detect and anonymize this, but said papers are pretty light on detail, and source code is meaningless since we don't know if it's what's actually running server-side). And also not to mention the rather sketchy distribution methods, like piggybacking on .NET downloads via chip.de in a manner that's been a hallmark of spyware since Y2K.

> they prefer to stick to their belief system, which is more comfortable and does not imply hard work.

"Am I out of touch? No, it is the children who are wrong."

----

> Sad story, extremely frustrating too, but there is nothing we can do now.

Not with that attitude. The search engine technology y'all developed is pretty interesting from a technical standpoint, and could be put to use (I'm sure DDG would be interested in adding it to their mix, or perhaps Ecosia could use it to diversify their Bing/Yahoo results the way DDG does with their in-house crawler). Same with Ghostery's more efficient network request blocking engine² (though it seems like Ghostery's development is still ongoing, no?), which could be useful in other ad and tracker blockers. Neither of these are much in the way of money-makers (well, maybe the search one is, if y'all license it), but it'll at least help make the best of a lousy situation.

I get that it sucks - I've similarly felt the pain of a product into which I've put my blood, sweat, and tears ultimately failing. It's easy to write off the detractors and critics as simply uninformed masses who just "didn't understand how great of a product we have". It's harder to admit that the product wasn't great, or the name was terrible, or the market wasn't as big as anticipated, or what have you.

I'm confident that being the bright and enthusiastic people y'all are, you'll find your footing again. Just, um, try to come up a name that doesn't scream "adware" like "Cliqz MyOffrz" next time, lol. And maybe instead of writing off your criticisms as "FUD", actually examine why those criticisms persist and what you can do to better address them.

----

¹: https://cliqz.com/en/whycliqz/transparency

²: https://whotracks.me/blog/adblockers_performance_study.html