Hacker News new | ask | show | jobs
by notahacker 1207 days ago
I think the more important question is whether the notion of "indexing" for search purposes accurately describes what a corpus of data that never refers back to its source material does.

Ideally, you'd have an updated version of Robots.txt which specified whether it allowed agents to use content in training LLMs or not, which different content publishers would set differently for different reasons