Hacker News new | ask | show | jobs
by transcriptase 1090 days ago
One of the questions I have is whether models are being trained on the SEO {spam|blogspam|adsense optimized|spun} websites.
1 comments

Almost certainly. The web crawl data that GPT (and similar) LLMs are trained on is far too large to be entirely curated.