|
|
|
|
|
by kikokikokiko
1110 days ago
|
|
The only datasets that will be useful to train LLMs in the future will be the ones generated before 2022. Any content generated after this date will be analogous to steel forged after 1945, it will be inevitably contaminated by the "radioactivity" of LLMs. The good news is that the availability of data to train more and more powerful models will soon be gone, the bad news is it will take the internet as we know it with it. It will be a sad day when most of HN posts are AI generated, but this day will come, it's pretty much inevitable. The post above us is just a drop in an ocean of garbage generators that are just starting to pop up all around the old human web that we used to "love". We'll probably miss old Twitter someday, as ridiculous as it sounds. |
|
I don't know if there is any escape from this for native English speakers, though.