Hacker News new | ask | show | jobs
by inciampati 701 days ago
It's so, so complex! I confess I had a sense of this but had no idea. We don't even hear which MSA algorithm is used to align the protein sequences.
2 comments

Hi, I was one of the authors of this! I think we briefly mentioned this in a footnote somewhere (a lot of things got cut or moved to footnotes since it is already so long & wanted to focus on the ML parts that aren't described elsewhere).

But yes as @Flobosg mentioned, for protein chains they use jackhmmer to search 4 of the databases (except when searching Uniclust30 + BFD when HHBlits is used instead) and for RNA chains they used nhmmer to search then hmmalign to re-align these to the query chain.

Hope that helps!

Input MSAs are generated with jackhmmer and HHblits and further processed, if I recall Alphafold’s paper correctly.