|
|
|
|
|
by zhangce
969 days ago
|
|
There are actually a few ways to do this; and we have four: - `rps_doc_ml_wikiref_score`: a classifier that classifiers random webpage with Wiki references (used in Llama-1) - `ccnet_perplexity`: perplexity of an LM trained on Wikipedia (used in CCNet) - `rps_doc_ml_wikipedia_score`: classifier prediction for the document being a Wikipedia article - `rps_doc_wikipedia_importance`: Used in https://arxiv.org/abs/2302.03169 You can see the full table here: https://together.ai/blog/redpajama-data-v2 |
|