| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by snhly 1243 days ago

It is vaguely dystopian, true, but one thing I remember in order to sleep at night is that ChatGPT and the like are trained on human-written text. So we might currently be looking at ChatGPT at its very best, or close to its very best. Reasoning: from here on out, the stuff it trains on will be polluted with the automatically generated text. Photo copies of photo copies eventually lead to blurrier and crummier images of the real thing.

We can keep paying people to come up with optimisations to the algorithm itself, keep paying annotators to manually pepper human common sense into the system, but it's my theory that these payments won't keep up with the spread of automatically generated content in the source dataset and the negative impact that has on the language model that the algorithm outputs.

ChatGPT currently enshrines insight and style from 2020-2021 (more-or-less indistinguishable from insight and style from 2022-2023), but now that the system exists, rather than observing a rapid pace of new writing styles and original insights emerging on the web of 2024, we'll potentially see a slightly slower style/original insights emergence rate, then the next year an even slower emergence rate. This will continue until it reaches a stage where the spoken world of language and world wide web world of language have completely diverged, similar to the way 1950s film dialogue bore little resemblance to 1950s speaking styles.

Short-term, ChatGPT has called creative pursuits into question, but long term, I think such systems will strongly validate creative pursuits, and only really replace non-creative roles. By turning the web into a wasteland of written cruft, GPT will validate the need for human flourishes, error, divergences from the norm and the arbitrary rewriting of unspoken rules. I think only a strong AI raised like us in our own societies could infuse that kind of culture into its writing, but the process of developing such an AI would basically just be a reinvention of slavery, and we probably don't have the resources here on earth to support it longterm anyway.

2 comments

great_psy 1243 days ago

I agree with the premise that content will get more polluted, but there is an element of human vote every time we choose a prompt output and say “this is good enough for me to post/use/turn into a book”.

This is just a very convoluted way of manually labeling data as good and bad.

link

mestelan 1242 days ago

It would be interesting to see whether a label emerges, to denote content created pre-chatGPT; ex: certified pre-2023 AI-free content.

Also, it would be possible to train bots on an archive of such material.(accordingly, out of date; so less useful in numerous ways).

link

snhly 1242 days ago

China do have new laws requiring people on the web to indicate with a watermark (or similar) if their stuff was created with the help of AI. See: https://cacm.acm.org/news/267778-china-bans-ai-generated-med...

Even if western governments adopt similar laws, however, I'm not sure if they would be that effective. People would start messing with the definition of AI. E.g. 80 years ago a spelling and grammar checker would probably have fit society's definition of AI, and both of those techs arguably have a cultural impact on the web. Spellcheckers lead to less new words or dialectal variations of words coming into existence, for example.

link