Does anyone know if Hacker News comments are being used as training data? I wonder this about Gmail, Skype, Voice Conversations on Xbox Live, etc. Mostly too afraid to ask because it sounds like paranoia.
Probably. HN is fairly plain HTML so Common Crawl should have no issue crawling it, and I'm not aware of any HN optout there (which would go against the usual public accessibility of everything on HN to APIs and projects etc), nor would any of the obvious data-filtering measures filter it out.
It seems pretty safe to assume that anything you create in public forums (and someday maybe "private" ones with data-sharing arrangements) is or will be used as training data.