|
|
|
|
|
by qayxc
1812 days ago
|
|
> Just like how people are allowed to read websites, but scraping is often disallowed. Hosting code on Github explicitly allows this type of usage (scraping) according to their TOS so I have to ask again - why the sudden complains? Are we still talking about a shortcoming of the ML model, which very occasionally spits out a few lines of copied code or should we include search engines into this, because they do the exact same thing by design? robots.txt, for example, has a non-binding, purely advisory character as well and Common Crawl [0] (also used for training GPT-3) publishes a dataset that by definition contains GPL'ed code as well, no matter where it's hosted. So is that off-limits now, too? [0] http://commoncrawl.org |
|