I'm pretty sure ChatGPT doesn't know it's sources. If it generated "that cat sat on the mat", then what (even from a theoretical POV) is the source of the word "mat" ? Note that it's not pulling the whole "cat sat on the mat" sentence from anyplace - that's not how it works - it's just generating this one word at a time based on the statistics (collected over all the text it was fed) of what word is most likely to follow what came before.
So, who gets credit for the word "mat" being generated in that context ? I guess any texts talking about cats and mats in close proximity may deserve some of the "credit", but it goes way deeper than that since why did ChatGPT choose to output such a trite sentence (albeit while only selecting one word at a time), rather that something else about cats or perhaps a more interesting thing that cats often sit in/on ...
People seem to assume that ChatGPT is pulling entire "facts" from various sources, but that's just not how it works - it's just feeding all the texts into a giant meat grinder of word statistics. It knows about words, not facts.
> People seem to assume that ChatGPT is pulling entire "facts" from various sources, but that's just not how it works - it's just feeding all the texts into a giant meat grinder of word statistics. It knows about words, not facts.
So, who gets credit for the word "mat" being generated in that context ? I guess any texts talking about cats and mats in close proximity may deserve some of the "credit", but it goes way deeper than that since why did ChatGPT choose to output such a trite sentence (albeit while only selecting one word at a time), rather that something else about cats or perhaps a more interesting thing that cats often sit in/on ...
People seem to assume that ChatGPT is pulling entire "facts" from various sources, but that's just not how it works - it's just feeding all the texts into a giant meat grinder of word statistics. It knows about words, not facts.