Hacker News new | ask | show | jobs
by GaggiX 1161 days ago
The article only clarifies that the dataset used to train the tokenizer is baised, not the entire dataset used by the GPT model.