Hacker News new | ask | show | jobs
by 453qtgreq 1292 days ago
>Is the claim that these were all simply copy and pastes of something on the internet in their entirety? And that as such the internet already seems to contain essentially every permutation of everything I could ask ChatGPT, as to me this sounds highly implausible.

It's the training data supplied to GPT3 (as explained by OpenAI themselves), so yes, it is literally true. You are just seeing snippets of the internet, re-formed and regurgitated.

It can only do what you ask.

2 comments

So I appreciate the jist of your point but the way these models work is rather more complicated that copying and pasting snippets and so it certainly is not 'literally' true. The models are trained to predict sub-word level tokens from the internet training dataset, so the level of re-formation and regurgitation in a generated sentence can be vast, to the point of final sentence being novel it's own right.
100% of the training data of the salty jelly in a human skull, is a sensory input.

Were your argument flawless and your conclusion correct, then all human creativity would "literally" be a remix of things in the natural world, as even when we remix things made by other humans that too would ultimately derive back to nature.

This can certainly be asserted depending on how you wish to use those words, but just as it is not useful at predicting what our abilities are — for example, a perfect intellect could predict quantum mechanics from scratch by watching a camp fire die down on a rainy night, but we didn't do that in one step in the neolithic — so too this isn't useful at telling us what the limits of GPT-family LLMs might be, as that which has been built on such inputs, both in the case of humans and this particular AI, greatly exceeds the imagination of any single individual.