Hacker News new | ask | show | jobs
by pizza 677 days ago
If the samples came from HN, I wonder how likely it is that the text is already a part of a dataset (ie common crawl snapshot) so that the LLMs have already seen them?

edit: judging from the comments I saw, they were all quite recent, so I guess this isn't happening. Though I do know that ChatGPT can sometimes use a Bing search tool during chats, which can actually link to recently indexed text, but I highly doubt that the gpt4o-mini API model is doing that.