Hacker News new | ask | show | jobs
by greenbandit 783 days ago
I use web scraping to identify and monitor fraud.

Exhibit A: https://archive.ph/0ZUA8

This website is used to recruit people to set up "lead generation" Google Business Profiles and leave paid reviews.

Exhibit B: https://archive.ph/WWZuw

This is an example of the Craigslist ad used to initially attract people to the website above.

Exhibit C: https://archive.ph/wip/7Xig4

This is one of the Google Maps contributors which left paid reviews.

If you start with the reviews on that profile, you'll find a network of Google Business Profiles for fake service-area businesses connected through paid reviews.

Web scraping allows me to collect this type of data at scale.

I also use scraping to monitor the status of fake listings. If they are removed, the actor behind them will often get them reinstated. This allows me to report them again.

1 comments

I don't care if you use Web scraping to solve the Israeli / Palestinian conflict. You're not entitled to anyone's data, computers, services, etc because you've decided for altruistic reasons that it is appropriate.

Cool use case. Love it. Fascinating stuff. But if Google told you to stop, would you? Or would you instead decide to build a 5 server cluster of 200 4G modems spread across continents to continue your work? Because if you did I would assume that you've decided to move on from a cute little altruistic process into a commercial use of someone else's data to make a profit.

Wait - so you are saying that information on the public internet isn’t public? Man, I wish people would remember the origin of the web and the entire reason it exists. If you don’t want information public, protect it - otherwise, I say it’s fair game.
Remember the OP article is about a system that is designed to completely and directly circumvent protections.

If an organization puts a series of processes in place to prevent scrapers from wholesale taking data in violation of terms of service, and you develop a 5 server cluster of 200x 4G modems it's no longer "fair game" and you're directly being unethical in your use of someone else's services.

Yeah, I think it's fair to say that in the presence of anti-bot measures (whether they work or not) that the content on the website isn't public anymore.

Available to someone meeting certain criteria (student discount, senior discount) doesn't mean available to anyone. I see no reason that "not available to be consumed by autonomous agents" is somehow invalid in a way that unlimited refills is only available to humans and not robots.

I agree that there is a line at using someone else’s data to make a profit, but it is kind of ironic that you mention Google, because their exact business model is scraping websites to feed their search results and litter it with ads to make a profit. For me there is a big line between aggregating publicly available data (search results, reviews, news, job postings, etc. ) and intentionally violating terms of service like signing up for fake accounts an harvesting user data. So entitled maybe not (sites can try to prevent you from scraping), but if you make something publicly available you shouldn’t be surprised when people use it in ways you may not originally have intended (within legal boundaries of course).
>I don't care if you use Web scraping to solve the Israeli / Palestinian conflict.

Maybe you should though. It's always worth it to think about which giant's shoulder you're standing on. It's giants all the way down.

> cute little altruistic process

Maybe it is not the opinion which is unpopular, but the way it is being presented.