> how similar this document is to Wikipedia
So that’s a measure of how similar it is to the background vector of all (language in focus) Wikipedia data?
- `rps_doc_ml_wikiref_score`: a classifier that classifiers random webpage with Wiki references (used in Llama-1)
- `ccnet_perplexity`: perplexity of an LM trained on Wikipedia (used in CCNet)
- `rps_doc_ml_wikipedia_score`: classifier prediction for the document being a Wikipedia article
- `rps_doc_wikipedia_importance`: Used in https://arxiv.org/abs/2302.03169
You can see the full table here: https://together.ai/blog/redpajama-data-v2
- `rps_doc_ml_wikiref_score`: a classifier that classifiers random webpage with Wiki references (used in Llama-1)
- `ccnet_perplexity`: perplexity of an LM trained on Wikipedia (used in CCNet)
- `rps_doc_ml_wikipedia_score`: classifier prediction for the document being a Wikipedia article
- `rps_doc_wikipedia_importance`: Used in https://arxiv.org/abs/2302.03169
You can see the full table here: https://together.ai/blog/redpajama-data-v2