|
It is vaguely dystopian, true, but one thing I remember in order to sleep at night is that ChatGPT and the like are trained on human-written text. So we might currently be looking at ChatGPT at its very best, or close to its very best. Reasoning: from here on out, the stuff it trains on will be polluted with the automatically generated text. Photo copies of photo copies eventually lead to blurrier and crummier images of the real thing. We can keep paying people to come up with optimisations to the algorithm itself, keep paying annotators to manually pepper human common sense into the system, but it's my theory that these payments won't keep up with the spread of automatically generated content in the source dataset and the negative impact that has on the language model that the algorithm outputs. ChatGPT currently enshrines insight and style from 2020-2021 (more-or-less indistinguishable from insight and style from 2022-2023), but now that the system exists, rather than observing a rapid pace of new writing styles and original insights emerging on the web of 2024, we'll potentially see a slightly slower style/original insights emergence rate, then the next year an even slower emergence rate. This will continue until it reaches a stage where the spoken world of language and world wide web world of language have completely diverged, similar to the way 1950s film dialogue bore little resemblance to 1950s speaking styles. Short-term, ChatGPT has called creative pursuits into question, but long term, I think such systems will strongly validate creative pursuits, and only really replace non-creative roles. By turning the web into a wasteland of written cruft, GPT will validate the need for human flourishes, error, divergences from the norm and the arbitrary rewriting of unspoken rules. I think only a strong AI raised like us in our own societies could infuse that kind of culture into its writing, but the process of developing such an AI would basically just be a reinvention of slavery, and we probably don't have the resources here on earth to support it longterm anyway. |
This is just a very convoluted way of manually labeling data as good and bad.