Hacker News new | ask | show | jobs
by abeppu 1622 days ago
This is not my area and I read the press release but not the paper -- but I cannot help noticing that they mention a sensitivity/recall number (>90%) but not a specificity or precision number. Even if you're not trying to be cynical or skeptical about this, when there are this few true positive examples available, how can one plausibly do a good job calibrating such a system?
1 comments

That's the key bit here. Supervised learning is not applicable here and any sort of fitting to known labels is doomed to fail. The system is not calibrated, tuned or parameterized. The ML part learns in a self-supervised manner what are the spike sequence features important for a successful (proliferating) coronavirus (or - more correctly - learns low-dimensional embeddings of multi-point interactions/co-occurrences of different amino acids in spike proteins). The rest is based on either frequentist statistics, or computational biochemistry. At no point in training EWS any information about certain sequences belonging to High Risk Variant classes (Variant of Concern, Variant of Interest etc.) is fed to EWS.