Genomes are sequenced when they're both accessible and are of scientific/economic interest. This results in an overrepresentation of:
- Humans and their pathogens
- Economically important organisms (e.g. crops and farm animals) and their pathogens
- Model organisms
For example, compare the number of Genebank sequences for HIV-1 [1] versus Feline immunodeficiency virus [2]. HIV-1 is additionally problematic because it has an extremely high mutation rate [3], making it more likely for a random sequence to match some sequenced HIV-1 genome.