| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chatmasta 3029 days ago
	If they are using the data internally (e.g. for ML training), they can get away with not buying it. They already scrape every website in existence and feed their data to ML training algorithms, it's not like Google needs to pay for a data set just to use it internally.

2 comments

sf_rob 3029 days ago

Publishing something does not make it public domain. I am not a lawyer, but saying they can do anything with what they scrape seems questionable.

link

chatmasta 3029 days ago

I mean in practice they can get away with it. As of now, there are no audit trails required for training of machine learning algorithms.

link

ucaetano 3029 days ago

I don't think that there's an issue with someone reading 100 reviews of a certain restaurant on the internet, and summarizing them by writing a new review:

"Most people who talked about it mentioned the great salad bar, but said that the fish was sub par. They commented on it being lively and child-friendly"

That's what I'd guess they are doing.

link

prepend 3029 days ago

Zagat likely had much more data than what was publicly available. I have no visibility, but can imagine non-public data useful for training like individual reviewer biases, reviews through their edit lifecycle, stuff like that.

link