| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by limpon 4612 days ago

I'll try to give you a description of what transcription factors do and what the importance of Stamatoyannopoulos finding is. I hope it doesn't sound too weird.

Just noticed, how long my "brief" description is, so here is a TL;DR: Comparing cells to computer programs: A program figures out by itself which functions to call based on the byte sequence of the function in memory and changes these preferences based on the environment its running in.

As you know, the DNA in our cells encodes the blueprint for all proteins that exist in the human body. I'm trying to compare this to string encoding, but might be completely wrong, because string encoding is more difficult to me than DNA encoding ;) The DNA is our hard drive. This hard drive holds the information for all proteins. A protein will be a string. Our alphabet is made out of 21 characters, they are called amino acids. Each character (amino acid) is encoded as a 3bit sequence on the DNA, the so called codon. Opposite to the binary system, we have a base4 system, meaning each bit can have 4 different states (A,T,C,G), the nucleic acids. So, given a byte sequence from the hard drive (gene on the DNA), we always look at codons (three bits) and can then translate those codons into amino acids. ATG translates to Methionine, GGA translates to Glycine etc.... Having 4^3 (64) different codons but only 21 amino acids, some amino acids can be represented by several different codons. On top of that, some codons have a special meaning, such as "\n". Those special codons are called "stop codon", because it marks the end of a protein and the program stops decoding from this point on. In addition, there is also a "start codon" that marks the begin of a string.

When a cell wants to make a certain protein, it copies the byte-sequence from the DNA and translates it into the actual amino acid sequence and as such forms proteins (prints a string in our example).

The problem we're now facing is that every cell in your body (with a few exceptions, of course) has the exact same DNA sequence. But still, nature is able to somehow make different celltypes, such as a skin cell, blood cell, neuron, you get the point. The way how cells "know" which proteins to make is that the hard drive doesn't only hold encoded strings but also some byte sequences in between that don't encode any strings. These sequences are called regulatory sequences. These sequences attract so called transcription factors. Transcription factors are proteins that you could understand as a pointer. They will point to some genes, but not to others and thus instruct the cell which byte sequences should be translated into strings.

These regulatory sequences can not only attract transcription factors, but also suppressive factors. Those are proteins that prevent transcription factors from binding and thus ensure that certain proteins are not made.

There are hundreds of different transcription factors and repressors encoded on the DNA, but only a handful exist as protein in each given cell and they define the difference between celltypes. They come and go as the cell's environment changes as well.

So, going back to our string analogy, say you have saved thousands of strings on your harddrive but only want to show a subset of those to the user (eg language selection). You start your program by passing some information about which strings to choose. The same happens here as well, only that the information which strings to chose is somehow intrinsic but still different between celltypes. And on top of that you not only have distinct languages, but the languages overlap and language A uses string 1,2,3; language B uses string 2,5,6; and language C uses string 1,2,6 etc. Because of this complexity it is very difficult to understand how the cell "knows" which proteins are right.

Stamatoyannopoulos now for the first time proved that even though two different codons encode the same amino acids they might attract different transcription factors.

To get an understanding of the importance of this finding, imagine having a character encoding, such as DNA, where two different byte sequences encode the exact same string. Now imagine having a program that automatically chooses which strings to display based on which byte sequences encode certain characters.

And now, image that the DNA doesn't only encode strings, but also functions and closures, pointers, variables etc. So each time a part of the DNA is translated, the state of the program changes. Nature is vey complex and we're just at the beginning of understanding the code.

We have methods where we can out-comment some part of the DNA and see how the program changes (mutations). In addition, we recently learned how to see which part of the DNA is being used in certain cells. And in this paper, they refined this method by actually looking at all pointers in a cell and figuring out which part of the DNA they pointed at. And they did this for 81 different cell types to find networks that act together in certain settings.