|
|
|
|
|
by BioGeek
556 days ago
|
|
> Also can we train this same model on regular language data so we can converse about the genomes? Yes! That is what has been done in ChatNT [1] where you can ask natural language questions like
"Determine the degradation rate of the human RNA sequence @myseq.fna on a scale from -5 to 5." and the ChatNT will answer with "The degradation rate for this sequence is 1.83." > My biggest point of confusion is what type of practical things these models can do. See for example this notebook [2] where the Nucleotide Transformer is finetuned to classify genomic sequences as two of the most basic genomic motifs: promoters and enhancers types. Disclaimer: I work at InstaDeep but was not involved in either of the above projects. [1] https://www.biorxiv.org/content/10.1101/2024.04.30.591835v2
[2] https://github.com/huggingface/notebooks/blob/main/examples/... |
|
The reason I ask is I have a bunch of genes where I can’t get much better than a 1:many orthology mapping, and if this method can capture related promoters/intronic regions etc per gene, and tell me if they are related, that would be a huge help (assuming this works on eukaryotic genomes).