Hacker News new | ask | show | jobs
by aik 913 days ago
Raw text from a website including header text and footers and links and images etc is very dirty stuff.
1 comments

The actual content is the clean stuff. If you disagree then you accept OpenAI could just create all the content themselves instead of scraping, which is comparatively trivial.