|
|
|
|
|
by gridit
3362 days ago
|
|
"If Google could find a way to take that corpus, sliced and diced by genre, topic, time period, all the ways you can divide it, and make that available to machine-learning researchers and hobbyists at universities and out in the wild, I’ll bet there’s some really interesting work that could come out of that. Nobody knows what,” Sloan says. He assumes Google is already doing this internally. Jaskiewicz and others at Google would not say." For books that are scanned, but with no extra licensing, would Google be allowed to do anything with the data? Create a very delocalized n-gram set? Use it as the "test" set (not even cross-validation, where it might influence hyperparams) for a ML algorithm? Edit: would love to know where google's authorization derives from, with the ngram set. Somewhere in the Judge's orders? A negotiated fee with the Authors Guild? |
|
For example, re:ngrams
""" Similarly, Google Books is also transformative in the sense that it has transformed book text into data for purposes of substantive research, including data mining and text mining in new areas, thereby opening up new fields of research. Words in books are being used in a way they have not been used before. Google Books has created something new in the use of book text-the frequency of words and trends in their usage provide substantive information. [...]
On the other hand, fair use has been found even where a defendant benefitted commercially from the unlicensed use of copyrighted works
"""
Oh man, this is mind-blowing.
[0] https://copyright-casebook.com/about/recent-cases-edited/aut...