| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mmazing 701 days ago

They make it clear in the paper that their primary "real-world" concern is that it's difficult to distinguish synthetic data from real human interaction when scraping data from the web. This will only get worse over time with our current way of doing things.

How are they supposed to deliberately train on synthetic data when they don't know whether it is (synthetic) or not?

Also, do you not feel that it is presumptuous to dismiss a body of work in a few sentences with a "seems fine to me"?

1 comments

simonw 701 days ago

In this case I wasn't reacting to this specific paper so much as to the widespread idea (at least that I've observed among AI skeptics) that "model collapse" is a huge problem.

link