|
|
|
|
|
by fastaguy88
1486 days ago
|
|
Sorry to post on at topic I know nothing about. To me, this looks very similar to local sequence similarity search (e.g. BLAST), where there are very rapid methods that use tuple-lookup and banded alignment to quickly identify "homologs" (the same entity). The nice thing about similarity searching algorithms is that they give you a very accurate probability of whether two strings are "homologous" (belong to the same entity). Perhaps I have the scale wrong, but it is routine to look for thousands of queries (new entities) among hundreds of millions of sequences (known entities) in an hour or so (and sequences are typically an order of magnitude longer than entity names). The problem is embarrassingly parallel, and very efficient algorithms are available for entities that are usually very similar. |
|