| HN Mirror

That's the key bit here. Supervised learning is not applicable here and any sort of fitting to known labels is doomed to fail. The system is not calibrated, tuned or parameterized. The ML part learns in a self-supervised manner what are the spike sequence features important for a successful (proliferating) coronavirus (or - more correctly - learns low-dimensional embeddings of multi-point interactions/co-occurrences of different amino acids in spike proteins). The rest is based on either frequentist statistics, or computational biochemistry. At no point in training EWS any information about certain sequences belonging to High Risk Variant classes (Variant of Concern, Variant of Interest etc.) is fed to EWS.