Hacker News new | ask | show | jobs
by matt4077 3040 days ago
CASP is a pretty nice dataset, so is all of the PDB.
1 comments

The PDB represents the best we have, but I wouldn't call it a great dataset for learning. The 150,000 known structures are a drop in the ocean when it comes to the space of possible sequences/structures.