Hacker News new | ask | show | jobs
by pfisherman 821 days ago
RNA is an obvious example. The examples and benchmarks they give in the paper are not the straw men the DNA LLMs are beating the stuffing out.

Also CRE activity is highly cell type specific. This article is a pretty awesome demonstration of model guided design of cell type specific cis regulatory elements.

https://www.biorxiv.org/content/10.1101/2023.08.08.552077v1

An LLM would not be able to do this because DNA itself contains no contextual information about cell type - every cell has a copy of the full genome. Epigenetic tracks however contain a lot of information germane to the cellular context - ex which parts of the genome are being transcribed.

1 comments

but epigenetics is just DNA. it's state information stored directly in the DNA, or in directly attached machinery. from the perspective of learned models, those are just other features.

But realistically, the right source for transcription is the RNA in the cell, not the epigenetics. Nearly all cell type profiling is based on RNA. It's far easier and more reliable to interrogate the transcriptome than to try to gain info from epigenetic states.

Epigenetics is not just DNA, think of it more like the (hidden) state of DNA. Histone modifications and open chromatin and other epigenetic readouts are like emissions / indicators of the hidden state.

The relationship is like that between the words in a book and the page that is actively being read. I know that is a hackneyed analogy; but coffee is wearing off :)

Those are all readable using standard DNA sequencing techniques, so again, it's just state attributes of the DNA.

(I've worked in genomics for 30+ years. I'm not just spitballing here).

DNA libraries are read out using sequencing techniques. But I doubt anyone would say they are just measuring DNA. It’s kind of like saying a luciferase based assay is just measuring the intensity of light.
yes, that is the terminology we use.