Y
Hacker News
new
|
ask
|
show
|
jobs
by
aik
913 days ago
Raw text from a website including header text and footers and links and images etc is very dirty stuff.
1 comments
endisneigh
913 days ago
The actual content is the clean stuff. If you disagree then you accept OpenAI could just create all the content themselves instead of scraping, which is comparatively trivial.
link