Hacker News new | ask | show | jobs
by senko 1370 days ago
> you are providing a service that scrapes videos from YouTube. ... > I could completely understand why Google would not want such a feature in your product

So, when Google scraped the internet to provide search engine, that was okay, but when some other site scrapes something, it is not?

If they are doing something illegal, by all means sue or report them.

But Google acting as police, judge, jury and executioner here shows why it's a problem to have a single unregulated company with so much power.

5 comments

Google did not only scrapped the web.

- Google recorded half of the planet with cameras scanning license plates, addresses, faces. You see those blurred in Google Maps, but the license plates, faces, are not blurred in the internal data at Google. I am still to see confirmation of data correlation not occurring.

- They scanned the Wifi MAC addresses associated with physical addresses "by mistake".

- They scanned public records of companies, and associated them publicly and sometimes the private address of the owner, and invite you as the owner to "take ownership" of their google map data...

> They scanned the Wifi MAC addresses associated with physical addresses "by mistake".

https://www.theguardian.com/technology/2010/may/15/google-ad...

  German request for data audit reveals the web giant 'accidentally' stored payload information from open networks
> Google recorded half of the planet with cameras

not to be a google apologist, but they only ever record public areas. Do you hold tourists that take photos to the same standard?

> are not blurred in the internal data at Google.

whatever is in internal google is irrelevant, since the data is not exposed publicly. Do you expect that a tourist that took the same photo with a license plate to blur out their photo in their own private album? I would expect that the tourist that publishes the photo (say, on facebook) blur out the license plate, but not when it's in their own private album.

> sometimes the private address of the owner

how did google get the private address of the owner of a public business? I don't quite understand the claim to the wrong doing.

A single person that collects data at the same scale at Google? I will absolutely hold him to the same standard.

I mean, that was a plot point in a Batman movie not too long ago. And even Batman doesn’t deserve that kind of power.

> whatever is in internal google is irrelevant, since the data is not exposed publicly

The data may not be exposed publicly but can still be sold or used against you.

> they only ever record public areas. Do you hold tourists that take photos to the same standard?

If those tourists are creating a massive database of everybody, then yes.

The issue isn't really the recording itself. The issue is what is done with the data afterwards.

>not to be a google apologist, but they only ever record public areas. Do you hold tourists that take photos to the same standard?

Tourist don't do the same thing. Why do you think there is almost no Google Street View in Germany?

> whatever is in internal google is irrelevant, since the data is not exposed publicly...

GDPR -> "What information must be given to individuals whose data is collected?"

https://ec.europa.eu/info/law/law-topic/data-protection/refo...

> how did google get the private address of the owner of a public business? I don't quite understand the claim to the wrong doing.

Google got it because some owners have their company registered to their own private address. It's the extra step of publicly exposing that info and pressuring you to take ownership of it otherwise it will be shown on Google Map an offer that currently is 98% of the market. Why do you think you have the option of blurring the view of your house in Google Maps?

> Do you hold tourists that take photos to the same standard?

Scale, intent and usage matters in law.

they only ever record public areas

They tried to get around that with Pokemon Go with some success. I don't know if all that content they scraped is publicly available however.

What is a public area? Anywhere that can be viewed from any spot accessible to the public? So the only non-public areas are rooms with no windows?

If I go to Google Street view I can see a lot of details from inside my house, same with the houses of my neighbors.

If you stop and stare inside someone's window in public, people will notice and wonder what the hell you're doing. Nothing's stopping anyone from staring through people's windows through Google Street view though.

In most "free" countries, people are free to stand outside your house on the public street and look towards your house as much as they want. They can even take photos.

So you can wonder what the hell they're doing as much as you want, because what you're suggesting is perfectly legal.

> So, when Google scraped the internet to provide search engine, that was okay, but when some other site scrapes something, it is not?

No, but when Google scraped the Internet, nobody did anything about it.

The value they added by making everything searchable was so great.

Giving content creators platform flexibility is clearly a threat to YouTube.

Double standards are twice as good.

> The value they added by making everything searchable was so great.

Altavista, yahoo

++Google
Everyone was within their rights to deny Google access to their machines for scraping, but that would have been impossible to do. The web is only a psuedo public apparatus, filled with private computers. While you could technically deny access to a specific person you could never actually implement that or follow it up legally.
While scrapping Google follows robots.txt file rules
https://www.youtube.com/robots.txt

Looks like scraping is allowed then, if your robot is named "Mediapartners-Google*" And you can name your robot whatever you like (there's no legally enforced nor defacto registry), so my robot:

"Mediapartners-Google-im-cleary-not-a-member-of/1.0"

Can legally scrape anything on google's youtube. Perfect.

Robots.txt is a thing.
If robots.txt were a thing, Google would never have built its search engine. Everyone would just allow Altavista and disallow everyone else, like they do with Googlebot and Bingbot now.

Also, the issue here is not that Google banned their tool from accessing YouTube because the tool didn't obey robots.txt. The issue is that Google, the search engine, nuked their site, because they didn't like its content.