Hacker News new | ask | show | jobs
by Tenoke 1804 days ago
Yes, it would stiffle NLP research immensely and we likely wouldn't see anything better than gpt3 for years if such restrictions are put in place.
3 comments

You're basically seeing how some people would have had open source play out. You can look at and use the code but not to make money or in any other way that I personally disapprove of. This is a world where open source would have ended up being pretty much irrelevant.
Are we now also not seeing now why people would want to do that? A multi-billion dollar company using people work to make more profits without paying them.

I definitely understand why people pick a license that disallows use someone doesn't agree with. Imagine baking cookies for your friends, and one of them reselling them. The material effect is the same to you, you gave away your cookies, but sometimes you make/do something for a certain group of people and not for other to make a profit of your work.

People can do whatever they want with their work, including not sharing it at all.

But a great deal of the value that's come from open source generally has been that open source licenses haven't imposed the sort of usage-based restrictions (e.g. free for educational use only) that were fairly common in the PC world.

And, to your example, in the case of software the incremental copy that your friend sold cost you absolutely nothing. So it comes down to a purely emotional response to someone else making money off something you made.

>So it comes down to a purely emotional response to someone else making money off something you made.

Exactly, as I said, the material situation is the same. But we all are emotional beings, you would do certain things for your family you wouldn't for strangers. I don't think this case is any different.

I personally don't work for free for a company, but I do charity work for free. Working for a company in the time I work for a charity would "cost me absolutely nothing" if I already spend the time anyway, but everyone understands the difference.

There is a difference between a model that achieves "fair use" of copyrighted work and one that regurgitates copyrighted work without attribution.
You’re free to privately research with this data but commercializing other people’s work using ML is theft.

Edit: commercializing of the derived work is one explicit consideration used by US law in making a fair use determination. That said, even if it weren’t commercialized it may still be infringement and I believe it is.

Commercializing isn't really the issue, it's still copyright infringement even if you release it for free (i.e. piracy) -- it's unauthorized redistribution (i.e. copying).
Even if we accept that (which many wouldnt as most licenses say little about research), the research would never be very useful if you can never make a comparable dataset to use in the real world.
I get that the problem is commercializing, but the theories around copyright that are being deployed here would prevent even free, open-source NLP research from becoming a reality.