|
I think the BS-generation problem with ChatGPT goes far deeper than citing sources, for a variety of reasons. 1) It's not a search engine, even if it behaves a bit like one. It's not "retrieving answers" to your questions (from sources that it could choose to cite). ChatGPT is really just a "language model", so it has no notion that what you're typing is even a question/query .. your input is just treated as sequence of words (which ChatGPT has zero understanding of), with ChatGPT's response then being a further sequence of words that it has calculated are (one) statistically probable continuation of what you typed (you can keep asking it for alternative answers, and it'll continue generating additional alternative statistically probable continuations). The websites/etc that ChatGPT was trained on are just sources of language that it consumed in order to learn the statistics that let it make these continuation predictions. It's not memorizing "facts" from websites, just word statistics, and these are mixed in with the statistics from all the other sources it was trained on. If it generates the word "walk" as part of a response, it can't cite a source for that since there essentially is none - only a bazillion text sources it was trained on that collectively made the word "walk" a high probability continuation on the words it had generated leading up to that... 2) Even if ChatGPT had been designed to deal in "facts" (rather that words statistics) associated with specific sources, the bullshit problem isn't just knowing the varied reliability of the sources it was trained on, but how those "facts" are combined. To combine multiple facts and correctly deduce something new from them would require intelligence, but ChatGPT doesn't have any intelligence - it's just a statistical word generator, so the way it combines snippets from different sources is again just statistical word generation, with zero knowledge of the meaning of the words it is generating or whether it makes sense! What makes ChatGPT seem semi-intelligent is that a lot of what it was trained on was text written by semi-intelligent humans, so the "sequence of words" it is generating, following the statistics of human speech, seems like something a human might say... until you start paying attention to the meaning of the words and realize it's often good-sounding garbage. |
Useful for fiction, advertising copy, and literary criticism. Not so good for fact retrieval.