Hacker News new | ask | show | jobs
by claytonius 1089 days ago
Part of what’s interesting here is that there haven’t been any robust featurizers for DNA in the same sense that we have robust featurizers for proteins (like ProtBERT et al.) that just work out of the box for amino acid sequences. Since many genes have >100k bp vs. proteins that are often less than 1k AAs, the much longer context window is needed. ProtBERT gets ~200k downloads per month on huggingface and DNABERT gets ~10s.