Hacker News new | ask | show | jobs
by kxs 2029 days ago
They trained on 170k sequences/ structures/ proteins, each sequence has 10s to 100s or even 1000s amino acids. Structure is much more conserved than sequence. Out of the 100 targets, roughly 1/4th have no similarity to known structures, so there shouldn't be an overlap for those with the training set. They did very well on those targets.