Hacker News new | ask | show | jobs
by gpm 930 days ago
This argument feels like arguing that it's a fact that Game of Thrones first book consists of <this text>, thus <this text> (the entirety of the book) isn't copyrightable.

If the bit sequence is likely to occur because it's someone else's creative content (or part of it is)... that doesn't seem like it can be a 'fact' in the relevant manner.

1 comments

What I'm wrangling with is this:

I agree that a particular sequence of words is copyrightable.

What I'm struggling with is that facts _about_ that corpus of text are not copyrightable. A simple fact could be that the word "bar" is the 5th word. The 6th word is "jazz". Etc.

A model is trained from these "facts" across many source documents. It is thus itself a derived 'fact' given a set of training inputs and parameters, so then how could _that_ then be copyrighted?

Put another way - there's the origin text and then.. is it turtles all the way down and none of it can be copyrighted because its all math and calculations derived from that?