|
|
|
|
|
by Nextgrid
1107 days ago
|
|
> like the full texts of the subreddit dedicated to counting to a million This was the source of the "anomalous tokens" phenomenon where the usernames of prolific counters was yielding weird and unexpected behavior on the OpenAI models. While definitely an interesting scientific curiosity, is there a reason you'd actually want this in a production model? |
|
EDIT: notice that the "tokens" that trigger the "glitch" are not the numbers themselves but the usernames of the people counting on that subreddit (which appear nowhere in the training dataset, due to a cleaning step that removed the "counting" texts)