|
|
|
|
|
by mmazing
701 days ago
|
|
They make it clear in the paper that their primary "real-world" concern is that it's difficult to distinguish synthetic data from real human interaction when scraping data from the web. This will only get worse over time with our current way of doing things. How are they supposed to deliberately train on synthetic data when they don't know whether it is (synthetic) or not? Also, do you not feel that it is presumptuous to dismiss a body of work in a few sentences with a "seems fine to me"? |
|