Hacker News new | ask | show | jobs
by gcbw3 2172 days ago
Considering the close relationship with Google and Wikimedia https://en.wikipedia.org/wiki/Google_and_Wikipedia and the considerable money Google gives them, how can one not see this project as "crowdsourcing better training data-sets for Google?"

Can the data be licensed as GPL-3 or similar?

4 comments

That's an incredibility zero-sum way of looking at the world.

Almost every research group and company doing NLP work uses Wikipedia I'd say it is a fantastic donation by Google which improves science generally.

> Can the data be licensed as GPL-3 or similar?

It's under CC BY-SA and (with a few exceptions) the GNU Free Documentation License.

I dont think the relationship is that close - all it says is google donated a chunk of money in 2010 and in 2019, it was a large chunk of money(~3% of donations) but not like so much to make a dependency.

> Can the data be licensed as GPL-3 or similar?

Pretty unlikely tbh. I dont know if anything is decided for licensing, but if it is to be a "copyleft" license it would be cc-by-sa (like wikipedia) since this is not a program.

Keep in mind that in the united states, an abstract list of facts cannot be copyrighted afaik (i dont think this qualifies as that, wikidata might though)

How so? Wikimedia-provided data can be used by anyone. Google could have kept using and building on their Freebase dataset had they wanted to - other actors in the industry don't have it nearly as easy.
Denny seems to be leaving Google and joining Wikimedia Foundation to lead the project this month, so probably you do not need to worry too much about Denny's affiliation with Google.