| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rdoherty 175 days ago
	Skimming the list, looks like most extensions are for scraping or automating LinkedIn usage. Not surprising as there's money to be made with LinkedIn data. Scraping was a problem when I worked there, the abuse teams built some reasonably sophisticated detection & prevention, and it was a constant battle.

6 comments

cxr 175 days ago

In order to create the data source that LinkedIn's extension-fingerprinting relies on to work, someone (at LinkedIn*?) almost certainly violated the Chrome Web Store TOS—by (perversely*) scraping it.

* if LinkedIn didn't get it from an existing data source

link

direwolf20 174 days ago

Programmers don't appreciate the fact that you can just violate terms of service. You can just do it. It's okay. The police won't come after you. Usually.

link

franga2000 174 days ago

I think the point is more "in order to prevent people from scraping their site, which is against their ToS, they scraped some other site, against its ToS".

link

direwolf20 174 days ago

Read "in order to have more money, I did things that caused other people to have less money"

link

awakeasleep 174 days ago

When someone who sees the world through a lens of morality notices somebody operating without morality, it is startling.

And it deserves a call out! The benefits to being so cynical that you’re numb to it come with a lot of tradeoffs

link

mosselman 174 days ago

Indeed. I read a lot of comments like these one you are responding on HN. It seems like there is a type of person who thinks that writing down what their rules are has some magical power.

“This isn’t what it was intended for”. Who cares?

A long long time ago in a galaxy far far away I would encounter warnings on pirating websites saying “If you are an FBI agent you are not allowed to continue on this site”. Imagine their utter disbelief and shock if they were to be arrested by an FBI agent that clicked past the warning anyway.

I agree is must be programmers as a type that like rules a lot and, they think, what a perfect world it could be if people would follow them.

link

cxr 173 days ago

I'd ask who you think you have me confused for or where you got that quote from, but I know how little it matters insofar as getting you to recognize whatever delusion led to your comment.

link

mosselman 171 days ago

I am sorry, I wasn't reacting to you I was reacting to the commenter who said:

"Programmers don't appreciate the fact that you can just violate terms of service."

link

cxr 165 days ago

> comments like these one you are responding

That's my comment.

link

bastawhiz 174 days ago

3000 extensions is few enough that a small team could download each extension manually over a few months. You don't need to scrape at all.

link

cxr 174 days ago

In the first place, no one said they needed to, only that they probably did.

Secondly, it's not "3000 extensions". They didn't somehow magically divine that the 2953 (+/-47) extensions we see here were the ones that they needed to download in order to be able to exploit the content-accessible resources described in their extension manifest. They looked at a much larger set, and it got filtered down to these 2953 that satisfied the necessary criteria.

link

bastawhiz 174 days ago

Lol no, did you even read the list? You could pay someone to just search "LinkedIn" and "talent" and "recruiting" on the chrome web store and download each extension. It's probably harder to automate this than it is to do it manually. This is something you could develop in an afternoon and pay a small team of people to do for pennies on the dollar. Even ten thousand extensions is nothing. Spread that over years and this is trivial.

link

cxr 173 days ago

For someone choosing to be so obnoxiously condescending, you are excruciatingly stupid.

link

winddude 175 days ago

a problem for linkedin != "a problem". The real problem for people is the back room data brokering linkedin and others do.

link

bryanrasmussen 175 days ago

from the code doesn't look like they do anything if they have a match, they just save all the results to a csv for fingerprinting?

link

cxr 175 days ago

"The code" here you're referring to (fetch_extension_names.js[1]) isn't and doesn't claim to be LinkedIn's fingerprinting code. It's a scraper that the researcher behind this repo wrote themselves in order to create the CSV of the data that they're publishing here.

LinkedIn's fingerprinting code, as the README explains, is found in fingerprint.js[2], which embeds a big JSON literal with the IDs of the extensions it probes for. (Sickeningly enough, this data starts about two-thirds of the way through the file* and isn't the culprit behind the bulk of its 2.15 MB size…)

* On line 34394; the one starting:

    const r = [{
                id: "aacbpggdjcblgnmgjgpkpddliddineni",
                file: "sidebar.html"

1. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>

2. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>

link

bryanrasmussen 174 days ago

thanks, my fault for not reading the read me and just doing a quick read of the code.

link

tlogan 174 days ago

By looking the list it seems like it is not really “sophisticated”. It is just list based on names (if there is a “email” in the name). Majority of extensions do not even ask for permissions to access linkedin.com.

link

RHSman2 174 days ago

I had the pleasure of scraping LinkedIn for a client. Great fun.

link

hsbauauvhabzb 175 days ago

Wont someone think of poor little LinkedIn, a subsidiary of one of the largest data brokers in the world?

link

charcircuit 175 days ago

Why frame what you are trying to say like that? Businesses of all sizes deserve the ability to protect their businesses from abuse.

link

jmward01 175 days ago

Do they respect my data? Why do they get to track me across sites when I clearly don't want them to but someone can't scrape their data when they don't want them to. Why should big companies get the pass but individuals not? They clearly consider internet traffic fair game and are invasive and abusive about it so it is not only fair to be invasive and abusive back, it is self defense at this point.

link

hsbauauvhabzb 175 days ago

They don’t need to track your web browser when they’re owned by Microsoft, because they track every action at a lower level.

link

0x1ch 175 days ago

Weird, I don't use Windows as an OS but have linkedin. I'd believe the concern and disregard of Linkedin's concern is fair game.

link

missingdays 175 days ago

What lower level? Microsoft owns internet?

link

zelphirkalt 175 days ago

The operating system. For example see the Windows 11 screenshot debacle/scandal.

link

brookst 173 days ago

“They” is an in incredibly useful tool.

link

thesmtsolver2 174 days ago

You do realize anti-scraping measures are one way of protecting your data too?

link

pluralmonad 174 days ago

In this context, "protecting" means the interest of linkedin who aggressively sells the data. Users that give data to linkedin are not protecting their data either way.

link

john-h-k 175 days ago

Because you signed up to a set of terms and conditions saying LinkedIn can use your data in this way

link

inetknght 174 days ago

What if I signed up before those ToS said they could use my data in this way?

Oh right, companies change ToS and EULA and "agreements" without notice, without due process, and without recourse.

I have no problem changing how I use "their" data in such situations.

link

RulerOf 174 days ago

> Oh right, companies change ToS and EULA and "agreements" without notice, without due process, and without recourse.

Companies change their terms of service all the time. They usually send emails about it.

I've responded to decline them a handful of times and asked for my account to be deleted. I chuckle slightly at the work it creates, but sometimes it has been easier to close an account that way.

link

hsbauauvhabzb 174 days ago

No one likes paying taxes but they still do it. They could just not work and not have money and therefore not need to pay tax.

link

pluralmonad 174 days ago

Except what you have to pay each year for the privilege of staying in "your" house.

link

echelon 175 days ago

I didn't want the web to turn into monolithic platforms. I abhor this status quo.

You cannot function without these enterprises, but that doesn't mean they're ideal or even ethical.

Microsoft wins because of network effects. It's impossible to compete. So I think it should be allowed to assail their monopoly here by any means. It's maximally fair for consumers and for free markets.

Ideally capitalism remains cutthroat and impossible to grow into undislodgeable titans.

Even more ideally, this would become a distributed protocol rather than a privately owned and guarded database.

link

direwolf20 174 days ago

That doesn't actually mean anything

link

ronsor 175 days ago

I think they framed it this way because they don't consider scraping abuse (to be fair, neither do I, as long as it doesn't overload the site). Botting accounts for spam is clear abuse, however, so that's fair game.

link

hsbauauvhabzb 175 days ago

No, I consider all data collection and scraping egregious. From that perspective, LinkedIn is hypocritical when Microsoft discloses every filesystem search I do locally to bing.

link

dylan604 175 days ago

Are you not scraping a site with your eyeballs when you view a site?

link

hsbauauvhabzb 174 days ago

By that logic I can charge you for looking at me.

link

RockRobotRock 175 days ago

When they scrape, it’s innovation. When you scrape, it’s a felony.

link

nitwit005 175 days ago

I'm sure there are issues with fake accounts for scraping, but the core issue is that LinkedIn considers the data valuable. LinkedIn wants to be able to sell the data, or access to it at least, and the scrapers undermine that.

They could stop all the scraping by providing a downloadable data bundle like Wikipedia.

link

sidrag22 174 days ago

thinking more about, I don't think its a terrible thing that they prevent scraping. Their listings are already suffering from being flooded with garbage applications and having to sift through tons of noise. allowing scraping would just amplify that and make the platform almost entirely worthless.

I "scrape" linkedin in a roundabout way for personal use, and really what Ive found is that i should just maybee not bother at all. I can't get through the noise even when im applying at places that heavily match my skillset, and just get automated rejection emails.

link

compiler-guy 175 days ago

LLMs scrape Wikipedia all the time, or at least attempt to.

The data bundle doesn't help that at all.

link

nitwit005 174 days ago

That's true, the normal scraping would still happen, but it would eliminate this side business of trying to re-sell LinkedIn's data.

link

direwolf20 174 days ago

What is abuse? Is it anything that reduces my profit margin? Or is it anything that makes the world a worse place? The Flock CEO called Deflock terrorism, is he right?

link

mistrial9 174 days ago

this exchange -- obvious critical / perhaps insurrection speech versus a stable voice of business economics -- should be within the purview of an orderly and predictable legal environment. BUT things moved quickly in the phone battles. Some people say that the legal system has never caught up to the data brokering, and in fact the surveillance state grew by leaps and bounds.

So, reasonable people may disagree. This is a fine place to mention it .. what if individual profiles built at LinkedIn are being combined with illegitimate and even directly illegal surveillance data and sold daily? Everyone stand up and salute when LinkedIn walks in the room? there has to be legal and direct ways to deal with change, and enforcement to complete an orderly and predictable economic marketplace.

link

duskdozer 174 days ago

>BUT things moved quickly in the phone battles. Some people say that the legal system has never caught up to the data brokering, and in fact the surveillance state grew by leaps and bounds.

Partially by discrepancy in how responsive you can be or comprehensive you must be to win the next round of cat-and-mouse, and partially because a private/corporate surveillance apparatus is useful to a government that might otherwise be hampered by constitutional bounds.

link

sellmesoap 175 days ago

We enjoy the fruits of an LLM or two from time to time, derived from hoards of ill gotten data. Linkedin has the resourses to attempt to block scraping, but even at the resource scale of LI I doubt the effort is effective.

link

charcircuit 175 days ago

I am not denying that scraping is useful. If it wasn't people wouldn't do it. But if the site rules say you aren't allowed to scrape, then I don't think people should be hostile towards the people enforcing the rules.

link

ronsor 175 days ago

Well, they can try to enforce the rules; that's perfectly fair. At the same time, there are many methods of "trying" which I would not consider valid or acceptable ones. "Enforcing the rules" does not give a carte blanche right to snoop and do "whatever's necessary." Sony tried that with their CD rootkits and got multiple lawsuits.

link

cyanydeez 175 days ago

the abuse>using the information they publish to the public

link

b112 175 days ago

Yes, until it becomes abusive and malignly affects innocents.

link

schmidtleonard 175 days ago

The big social media businesses deserve a Teddy Roosevelt character swooping in and busting their trusts, forcing them to play ball with others even if it destroys their moats. Boo hoo! Good riddance. World's tiniest violin.

This is a popular position across the aisle. Here's hoping the next guy can't be bought, or at least asks for more than a $400M tacky gold ballroom!

link

xp84 175 days ago

I mean, regardless of who they are or even if you don’t like what LinkedIn does themselves with the data people have given them, the random third parties with the extensions don’t additionally deserve to just grab all that data too, do they?

link

mathfailure 175 days ago

Surely they do! The data is in the public internets, aren't they?

link

ronsor 175 days ago

They'd put Widevine or PlayReady DRM on the website if they could, I'm sure.

link

bigfishrunning 175 days ago

why can't they?

link

direwolf20 174 days ago

because they're only for video files?

link

hsbauauvhabzb 175 days ago

I say the same thing about my start menu sending every action I perform to bing.

link

josephg 175 days ago

Eh. I worked at a company which made an extension which scraped LinkedIn. We provided a service to recruiters, who would start a hiring process by putting candidates into our system.

The recruiters all had LinkedIn paid accounts, and could access all of this data on the web. We made a browser extension so they wouldn’t need to do any manual data entry. Recruiters loved the extension because it saved them time.

I think it was a legitimate use. We were making LinkedIn more useful to some of their actual customers (recruiters) by adding a somewhat cursed api integration via a chrome extension. Forcing recruiters to copy and paste did’t help anyone. Our extension only grabbed content on the page the recruiter had open. It was purely read only and scoped by the user.

link

xp84 175 days ago

Doesn't sound like your operation was particularly questionable, but I can imagine there must be some of those 3,000 extensions where the data flow isn't just "DOM -> End User" but more of a "Dom -> Cloud Server -> ??? -> Profit!" with perhaps a little detour where the end user gets some value too as a hook to justify the extension's existence.

link

RHSman2 174 days ago

I started their but it felt like a dodgy way (as it could be seen to be illegal). We then just went aloffical and went through Google search API’s with LinkedIn as the target. Worked a treat and was cheaper than recruiter!!!

So when pay the highest scraper, it’s ok! Same data, different manner.

link