Hacker News new | ask | show | jobs
How LinkedIn detects browser extensions (github.com)
382 points by siddg 2716 days ago
20 comments

The repo says "A look at how LinkedIn spies on its users"

I'm not convinced this is LinkedIn spying on users... rather, it's them protecting its users from the spammy people using these extensions. There's not a single extensions on that list that doesn't result in someone getting an unsolicited email.

Here's the full list; they're all spammy recruiting/sales extensions (nothing legit like uBlock or LastPass):

    daxtra
    SalesloftProspector
    SalesLoftCadence
    discoverly
    Ecquire
    Ebstabullhorn
    EbstaSalesforce
    ProspectHive
    talentbin
    Entelo
    Nimble
    amazinghiring
    colabo extension
    StepWells(colabo)
    found.ly
    datananas
    Linkedin-Hubspot Connector
    dux-soup(fixed)
    data Scraper
    aevy
    Lusha
    Lead Generator
    Candidate.ai
    Email Hunter
    Prospectify
    iMacros
    Prophet
    Leadiq
    HirEtuaL
    Contact Out
    Prospect.io
    saleslift.io
    Skrapp
    Slik
    CleverStaff
    Linked Helper
    Get Email
    Sourcehub
    Salestools
    SellHack
    Sourcebreaker
    turboHiring
    LinMailPro
    LinMailNavigator
    Leonard for Linkedin
    LinkeLead
    Loxo Social import
    Jlenty
    Social2Sugar
    Emply
    Linkedroid
    eLink Pro
    LinkMatch for zoho CrM
    LinkMatch for zoho recruit
    inkMatch for CatS
    LinkMatch for PCrecruiter
    LinkMatch for Pipedrive
    LinkMatch for Greenhouse
    Snapaddy Grabber
    ramper
    Linklead.io
    alore.io
    Hr-Skyen
    SeekOut
    Leadkedin
    icebreaker
    Spider for Linkedin
    recruiterNerd
    Crelate
    EyeMail
    Sales Lead Multiplier
    Email Finder
    Linkedin assistant Lily
    auto Connect tools Lily
    adapt Prospector
    Leadconnect
    Linkedbot
    People.camp
    instant data Scraper
    LinkMe tool
    adorito
    gay2sms
    Lusha (FireFox Extension)
    LinkedPro
    LeadGibbon
    Socialbff
And here's the code you can run yourself: https://pastebin.com/Ux684VtL
> There's not a single extensions on that list that doesn't result in someone getting an unsolicited email.

Nimble is just a CRM. Their extension does not crawl for email addresses, as far as I remember. Why does linkedIn need to "protect its users" from it? Isn't it rather to protect itself against the competition?

iMacros is a legit extension. But yeah, I guess there are recruiters using it to spam people.
Agree. iMacros is a completely fine macro recorder. Similar extensions like Kantu and Selenium IDE are not in the list.
well iMacros looks like another scraping tool, while those others look like testing tools that can be used for scraping.
>gay2sms

That's a malware, isn't it?

No idea, but if it is, and this is about protecting users from malicious addons, why did LinkedIn not just report that extension to Google?
> result in someone getting an unsolicited email.

That's pretty ballsy to bring up in a defense of LinkedIn.

Right. It ought to be "... result in someone getting an unsolicited email that didn't earn income for LinkedIn".
Well, another good reason for LinkedIn to thi is to protect their revenues. I heard of headhunters who don’t want to pay their (high) monthly subscription fees and instead “hack” their system. I guess “hacking” includes using these extensions.
Right. I once worked for a company that was trying to perform a business “matchmaking” service. We considered, as part of the implementation possibility-space, scraping people’s connections from LinkedIn in order to enhance our results. But LinkedIn has many advanced anti-scraping heuristics in their backend; this is just one of many. So we scrapped that option (before ever getting around to considering the ethics of it.)
Can you link to where they say that? I would figure someone doing something so helpful for users would at least document it. There's no reason to be surreptitious when doing such a favor. One wonders if they'll start offering a LinkedIn AntiVirus download with such an altruistic approach towards protecting users from what they have installed.
I think you're misunderstanding. LinkedIn isn't protecting the people with the extensions installed; they're protecting users FROM the people with the extensions.
Where they'll happily throw the same people under the bus if the user with the extensions installed is paying for an expensive recruiter license. Curious!
Ah, as an anti-scraper/anti-bot method, every user has all these local network requests made? Maybe it's the true reason, maybe not. Transparency is key here to assume anything more than the worst. Of course any of the rest of us with a modicum of smarts would just side load a custom extension via CLI args (or we'd just browser automate, headless if not detected). Even given the most generous justification, it reeks of careless decision makers playing whack-a-mole (likely fruitlessly) with the users in the crossfire.
I think it's a cat and mouse game. The more that Linkedin publishes about their anti-spam techniques, the more information spammers have to try to evade those anti-spam techniques.
I know one company at least that sets up a proxy physically near their clients use to obscure that they have a team on the Philippines manually assisting clients with their LinkedIn profiles.

Ultimately they need to police actual negative behaviour, not the mechanics of how people are doing it. But that means potentially restricting engagement of some of their most active users as well.

It can seem that way with server-side anti-scraping techniques with brute force detection and the like. But at some point you have to accept that playing the game on the client-side needs to stop escalating once you're making dozens of local extension resource requests in a user's browser. It makes me want to publish and maintain a legit scraper for LinkedIn that replicates human interaction. They'd DMCA the repo I'm sure, but it goes to show who fights against the open web. I see a "get X, Y, and Z features for free when you use LinkedIn Desktop instead of the website" coming.
Why do we accept this argument of obscurity, when discussing security vulnerabilities proper doesn't elicit the same response?

Why is obscurity OK in these situations? Wouldn't we all benefit with removing scammers if everyone legitimate worked together in the public? Its easy to defeat a single adversary. Its mighty hard to defeat a cooperating team.

LinkedIn doesn’t publish anything about their anti-spam techniques. This was published by a third party, because it was a client-side feature that third parties could discover. Most of LinkedIn’s anti-bot logic is on the backend and completely opaque.
In my mind I am associating this with LinkedIn's failed attempt to keep scrapers off their website by suing them (ref: https://arstechnica.com/tech-policy/2017/08/court-rejects-li...)

Other companies are making money with these extensions on LinkedIn's website and LinkedIn is not happy about it

Calling this "nefarious-linkedin" when it's obvious that LinkedIn is trying to protect itself from unauthorized data collection shows that the developer is either seeking for attention or didn't really look into the purpose of those extensions (https://github.com/dandrews/nefarious-linkedin/pull/1)
But how is this data accessible to the extension? I ‘m not an expert, but it seems that this data has to publicly available for an extension to find and parse it. Extensions don’t have magic Auth rights or credentials.
Extensions have the same auth rights as your logged-in account (the ability to see people who are out of network, for example). It’s against LinkedIn’s ToS to scrape data.
This should go both ways. It is against my ToS for LinkedIn to scrape which extensions I have installed.
I'm on the anti-LinkedIn side of this scraping debate.

But that said, LinkedIn never agreed to your ToS.

True, and I accept this is a potentially good legal refutation of this kind of argument. However, I do consider ToS-es untenable and unjust because of this power asymmetry.

If my computing node is interacting with your computing node, we should either both be able to put restrictions on the use of obtainable information or neither.

Yes, I get that the extension operates within the user's auth realm. But still it should not be able to access data you as a user cannot access. Maybe this is already enough to do damage though.
I don’t get it. How can a browser extension mine data that otherwise is inaccessible? This should be covered by basic RBAC. Or are they just convenient scrapers, saving time but otherwise not accessing privileged information. If so, the LinkedIn story about “protecting our users” seems a bit shaky.
The extensions are basically bots to collect info for the user with the extension installed, not steal info from that user. Most are either scraping email, names, and job titles as quickly as a bot can, or mass sending out messages to users based on some criteria.

Here's a video for one of the extensions https://www.youtube.com/watch?v=2XvtuZjblCc (Warning: loud music)

> The extensions are basically bots

No, most appear to be plugins for ATS/CRMs, which allow recruiters -- having found a lead on LinkedIn -- to then add them to their CRM. This is profoundly differently.

For anyone who is asking what/who LinkedIn are protecting with this, it's not the users with the extensions installed, it's to protect the other users on the sites. I poked through some of the listed extensions and most are basically bots that you can turn on that will crawl through LinkedIn pages very quickly and either collect info (like email addresses) or send out messages to other LinkedIn users.

I found this video for one of the extensions that is a good example of what I'm talking about https://www.youtube.com/watch?v=2XvtuZjblCc (Warning: Loud music)

In 2015 I wrote and publish and Chrome Extension for LinkedIn that calculated the age of a person and put that age next to the name in their LinkedIn profiles. It quickly went viral and showed up in several places including Product Hunt.

Someone from BuzzFeed reached out to me asking questions about it and then later that day wrote an article claiming that LinkedIn had asked me to take it down (until that point they hadn't). That night I received a cease and desist letter, so I took it down.

There were many valid reasons to ask for my extension to be removed, but I never got the impression that they were doing it to protect the users whose age was being augmented or at least it didn't feel that was their angle.

It felt more like "this data is ours, so back-off". Just to be clear, I'm not saying that they were rude in their communications or anything like that. But the C&D letter focused a lot on the techniques and uses of my extension and not so much on the "this violates user's privacy" or "this is not representing accurate data".

I just think that in general LinkedIn doesn't like people poking around and trying to scrape data in any way. In the end, that's their most valuable asset (users' data).

For anyone curious, I still have the website: http://www.whoisjuan.me/age-insight-linkedin/

C&D letters are written by lawyers. They don't appeal to your empathy over the PII of other users, they state facts and appeal to the legal standing LinkedIn (or $company...) has over the data being used.

That said, I have no idea of the reasons LinkedIn sent you a C&D. It could well be any of the proposed options, or something else entirely. I'm just highlighting that the language in a C&D will rarely give any indication of intent, at least not "well written" ones anyway.

>> In the end, that's their most valuable asset (users' data)

Some might say it's their only valuable asset...

> I poked through some of the listed extensions and most are basically bots that you can turn on that will crawl through LinkedIn pages very quickly and either collect info (like email addresses) or send out messages to other LinkedIn users.

I'm going to take the top ten from the list as an example:

daxtra -- Nothing like what you've described, plugin for a CRM

SalesloftProspector/SalesLoftCadence -- I don't see any crawling capability at all

discoverly -- Nothing like what you've described -- more like rapportive

Ecquire -- Nothing like what you've described, plugin for a CRM

Ebstabullhorn / EbstaSalesforce -- Nothing like what you've described -- plugins for Bullhorn and Salesforce CRMs only

ProspectHive -- apparently defunct, no idea

talentbin -- this is a social media aggregator

Entelo -- ATS plugin

Ignoring everything else, it seems a bit weird a page can make requests to an extension's assets without originating from that extension.
I guess this comes down to extensions that inject code / modify the page.

Extensions can choose if their assets are public or private, and if they reference the asset from injected code - it needs to be public.

It sounds like a better solution might be to track the injected / modified code, and only allow it to read the assets. But I'll bet there is some tradeoff i've no clue about preventing that from happening.

Imagine an extension modifying a page and adding an image. How would it allow the image to load if that wasn’t possible?
I would have hoped for some shared secret approach where the extension can generate one-time use urls for their bundled resources on demand and use those instead of easily predictable urls.

It seems that extensions like ad blockers that are explicitly targeted by such detection methods have ways for work around that (see https://github.com/gorhill/uBlock/blob/master/src/web_access...). I honestly would have expected for that to be the enforced default behavior.

I was thinking if an image is injected, it'd be injected by a script loaded from the plugin thus trusted.
It’s a logical thought but that isn’t how it works.

A script doesn’t really inject an image, it injects an image tag which contains a reference to the image. As the image gets loaded there is no check who created the tag.

> LinkedIn violates their own users' privacy in an effort to detect the usage of browser extensions. At the time of writing this, LinkedIn is scanning visitors for 38 different browser extensions.

No it is defending against malicious actors from abusing its API.

> No it is defending against malicious actors from abusing its API.

I do not really understand the concept of "abusing an API". If an API is amenable to a "bad" use, it seems entirely to be the fault of the API designers, not of its users. The designers built an API that enabled an usage that they did not want. That is their fault, how could it be otherwise?

That is exactly what LinkedIn is doing, they are preventing bad actors from calling their API essentially blacklist them. They cant be blacklisted via IP since they are scattered across the internet, so they are banning them productively. Simple and easy.
Why not simply rate-limit everyone reasonably?
Changing the name of the extension resources and any extra elements they add to the page would be enough to stop this. (It reminds me of another "trick" pages like to use: randomising the element IDs. Easily defeated by searching for other properties of the desired element.) Just like DRM, it's a stupid cat-and-mouse game, and the mice will always win...
Even if the intent by LinkedIn is legit this will soon get used by data tracking scripts to further de-anonymise people
Is that inherently wrong if the website doesn't want to serve anonymous clients?
There are already a dozen ways to fingerprint users, I don't think this is specifically revolutionary.
OK, but why does LinkedIn scan extensions?
To flag accounts that are scraping data or "revealing" email addresses.

Negative view: they're blocking people from circumventing their paid features

Positive view: they're protecting their other users from getting spammed

A lot of these are used as CRM type applications where people would love it if LinkedIn just charged for access to a more comprehensive API instead. LinkedIns messaging UI sucks, and ironically one of the reasons to want to use CRMs like Nimble to interact with your LinkedIn connections is to be able to better track communication with them so you don't spam. But of course people will use it to spam too.

If LinkedIn offered API access to messaging in a way that let CRMs work with them instead of feel forced to circumvent them I think most who want to use it legitimately would be perfectly happy to have LinkedIn impose various usage limits and peotections even if paid.

They should see this as revenue potential: there are lots of potential to get companies with legitimate reasons for more integration than the current API to upsell their customers on paid LinkedIn features if they are able to offer it in an approved way, and I bet many would be happy to let LinkedIn monitor how it's used.

If they try to block access instead, they'll find more and more companies keep offering the same, but manually.

They want to block tools that offer functionality similar to their paid offerings.
LinkedIn contains lots of personal data, a large part of which is only available to users who are signed in and/or paid members. They want to protect this information from potential exfiltration by these extensions and their backing companies.
They have zero scruples. This is a company infamous for spamming people.
And, do they need to do it or are they just data mining?
If they don't need it, isn't it illegal to collect that data under the GDPR?
I don't see the repo actually saying they're collecting this data, i.e. sending it back to their servers. It may just be a "reverse adblocker" - a list of signatures of extensions that the website's JS will try to interfere with on the user end.
Who is going to stop them? I'm sure the data is worth >0 to them (or someone)
To fingerprint you via what you cremations you have installed to track your browsing habits.
More metadata to shape information... do you have ublock, authy, lastpass, bitmoji, etc. Could be anything from metrics, to useful interactions.

Got the dropbox extension, show an option to upload your resume from dropbox directly. etc.

Blocking ads, show integrated ads through a secondary channel.

I really don't understand the downvotes.
LinkedIn's reputation isn't so great. Primarily because they harvest user mailboxes, and spam endlessly about Mirimir (for example) inviting recipients to join their associate at LinkedIn. And it's not just annoying. Sometimes it hurts people's careers.
Oh, LinkedIn's reputation is deplorable... what they did to bypass security on iOS (and I think Android too) are particularly interesting (mail proxy). I'm not saying that metadata collection is good, or that there aren't nefarious reasons... I stated that was one reason, and it could be to offer features.

I only created a linkedin account to stop all the email invites... and even then, refuse to install their app (links pervasive in mobile web) and only accept connections to those I've met personally, and very few recruiters.

Fair enough. But some people just downvote anything even neutral about something that they hate.

That's a funny story. But I have a funnier one. Not long ago, maybe the last time LinkedIn came up on HN, I created a test LinkedIn account as Mirimir. Or at least, I attempted to. Given that I use VPNs, I got a cellphone text authentication prompt. But Mirimir doesn't have a cellphone, so I blew it off.

And here's the funny part. A few days later, Mirimir received email from LinkedIn, inviting him to join Mirimir's network on LinkedIn!

I admit one of the extensions from the list is mine. But is not as malicious or spammy as some like to picture it. Most of them are complements, addons to help the user with their CRM. I don´t know of any intended to steal data ( i believe they will use scrapers or other ways instead of asking users to pay for an extension) There are well know CRMs like Hubspot or SOHO that aim to sync data. Yes, some others are used to send messages to connections.. just as unsolicited as Inmails, the linkedin paid version( but at least is to connections). They also block extensions that block their ads and extensions like help users to filter out "sponsored " content ( we did that) Regarding GDPR , even LInkedin says Is not actually their data but the users are data controllers ( owners) https://legal.linkedin.com/dpa . Obviously, this is not Ok with LinkedIn because they are a walled garden and not an open platform. The points is they do not let the user decide, customize or adapt their experience to suit their needs. Any feature that is not in their revenues agenda, gets killed even if thousands of users cry for it ( happens regularly ) and they do not let anyone else offer it. Nobody likes spam , but is up the user no to do it - is like if your gmail will not let you send an email to more than one person at a time or be conneted to any other app ( yes, I know there are limits in gmail ). Notice that is not the legal way that Linkedin takes to stop these services because in reality, they are a monopoly ( and as pointed earlier Courts has ruled against Linkedin). Neither they use a educational or marketing path telling the users why is better FOR THEM not to use those extensions. No, they use FUD ( fear, uncertainty & doubt) to scare users and cancel the Linkedin of the people who create this "competition" ..it happened to me, to the people of hunter.io, findthatlead and many others. Mafia style. This is not a moral justification from me, it is a business decision to offer extensions to give capabilities that people want.
Looks like they are trying to block spiders and protect its users
No, they're trying to protect their LinkedIn Recruiter license revenue.
> Furthermore, there's no good reason to use web accessible resources in an extension! You can always find a solution to your problem that does not require them.

How would I e.g inject an extension-provided image into a web page without using web accessible resources?

The only ways I can think of would be copying the image to a blob or drawing it on a canvas - both seem significantly more complex than just injecting an IMG tag and would still be detectable as side effects.

I'm not familiar with writing browser extensions, but data URI comes to mind.
Ah, right, I forgot those. That's true of course.

I think you could still use them for side-effect detection (watch for images/scripts/etc with a known data uri suddenly appearing in your DOM) - but at least you couldn't actively query it without the extension doing anything.

How is a webpage able to query the local file system? That sounds pretty bad.
It doesn't, it queries the local assets of installed extensions. Chrome (and I guess other browsers?) provide a way to do this, so the HTML etc injected by an extension can reference assets shipped with the extension.
You can query static asset of extension by it's path chrome://extensionid/asset.css
I am really in two minds about Linkedin, I cancelled my account years ago after getting spammed by recruiters, this could be an attempt to clean up but looking quite sinister in the attempt
Linkedin has been an issue for years for me, because they simply disclosed your email to anyone connected. This enables some people and/or corporations to scrap profiles and build spam email databases. After being annoyed about this, I started to change my linkedin dedicated email address frequently, 4-5 times a year. The conclusion was obvious: less than a few days after the change, I began receiving spam and proposals on this new dedicated email address, thus confirming the email scraping problem.

Yesterday I went back to Linkedin to reconfigure a new email address, and found that the account settings now incorporate a setting to hide your email address to anyone (inactive by default...). I've enabled it and changed again to a new dedicated email address, to see if it is true. I hope this time Linkedin did things right.

Maybe I should try again, I am not looking to hire or be hired so not really sure if there is a point anymore
The written tone used in the repo comes of as too drastic, specially as it only reports the collection of analytics on how LinkedIn users use the website.

Is the detection result reported back to LinkedIn?

In their [Privacy Policy](https://www.linkedin.com/legal/privacy-policy#your_device_an...) they do mention they collect information on "web browser and add-ons".

This reminds me of similar approaches used in other environments. For example in the game industry, anti-cheat techniques of detecting the running software in mobile devices to flag users. How do you think this differs?

Talking of LinkedIn. Any suggestions of how I can bulk-remove contacts? I was wondering if there’s a Chrome Extension? I’m assuming all I’m missing is the motivation to script it?
One aspect is that LinkedIn is protective of plugins that incidentally cover up their own ads. Notably several entries on this list once had such grievances filed against them.
Is this issue unique to Chrome? Does it happen with Firefox?
I do have the same localStorage item in my Firefox. The shown way of decoding the content works too.
The technique of attemping to load web accessible resources does not work in Firefox. For starters, Firefox uses moz-extension: instead of chrome-extension:, that's obviously trivial to adapt to, but Chrome then uses the extension's global identifier in those URLs, while Firefox uses a locally generated identifier, specifically to avoid this sort of fingerprinting.
Do they detect the extension... What they do after that? Hide the email?
Why this is dangerous?