Hacker News new | ask | show | jobs
by chatmasta 3029 days ago
If they are using the data internally (e.g. for ML training), they can get away with not buying it. They already scrape every website in existence and feed their data to ML training algorithms, it's not like Google needs to pay for a data set just to use it internally.
2 comments

Publishing something does not make it public domain. I am not a lawyer, but saying they can do anything with what they scrape seems questionable.
I mean in practice they can get away with it. As of now, there are no audit trails required for training of machine learning algorithms.
I don't think that there's an issue with someone reading 100 reviews of a certain restaurant on the internet, and summarizing them by writing a new review:

"Most people who talked about it mentioned the great salad bar, but said that the fish was sub par. They commented on it being lively and child-friendly"

That's what I'd guess they are doing.

Zagat likely had much more data than what was publicly available. I have no visibility, but can imagine non-public data useful for training like individual reviewer biases, reviews through their edit lifecycle, stuff like that.