|
|
|
|
|
by georgebarnett
922 days ago
|
|
What I'm wrangling with is this: I agree that a particular sequence of words is copyrightable. What I'm struggling with is that facts _about_ that corpus of text are not copyrightable. A simple fact could be that the word "bar" is the 5th word. The 6th word is "jazz". Etc. A model is trained from these "facts" across many source documents. It is thus itself a derived 'fact' given a set of training inputs and parameters, so then how could _that_ then be copyrighted? Put another way - there's the origin text and then.. is it turtles all the way down and none of it can be copyrighted because its all math and calculations derived from that? |
|