Hacker News new | ask | show | jobs
by endisneigh 917 days ago
This makes no sense lol. The information openAI is using is cleaned to begin with
1 comments

Raw text from a website including header text and footers and links and images etc is very dirty stuff.
The actual content is the clean stuff. If you disagree then you accept OpenAI could just create all the content themselves instead of scraping, which is comparatively trivial.