Hacker News new | ask | show | jobs
by orintorynchus 1620 days ago
I am the second author of the paper. We have been putting our head on the line for the last half a year. We detected Lambda, Mu (with a caveat, that we did not consider it competitive) and Omicron - all blindly.

We have been verifying all our predictions experimentally, post factum. And the method is purely data driven - with no fitting to the experiments or observations.

The first author came up with the approach, participated in analysis and got his hands dirty as everyone else. While he is a CEO of a 160+ person company, this has been a labor of love for all of us, done to a large extent in the evenings, during weekends and holidays. It is indeed an unusual situation. But this was not a regular project and InstaDeep is not a regular company either.

5 comments

Thank you for chiming in. It’s great that you detected them. I’m curious how many false positives you had during that same time? Did you detect many others that just didn’t pan out to be of significance?
This is something that amazes me. Any time a true High Risk Variant appears, it is clear as day in the system. This was the case with Lambda, Mu (which we predicted to have a limited propensity to proliferate), and now with Omicron. However, there are lineages, which are prospectively dangerous, that we detect. As the classification is relative, there is no fixed threshold beyond which we would call for alert. In the evaluation, we have been looking at 20 sequences per week, as this was roughly the testing capacity of our partners. Going for ONE sequence a week makes us detect some variants a bit later, but still. The sequences and lineages we predicted to be of interest were predominantly spreading afterwards. Some were just a blip on the radar, though. We aimed at sensitivity and not missing ominous signs, rather than specificity. NB: Each week there are thousands (now 12k+) new sequence variants. Out of them we consider 20. And detect most of the variants as early on as on the first day.
A massive thanks for the work you put in - I am sure it will pay off manifold. Assuming your models are robust over time this will clearly help the development pipeline of BioNTech / Pfizer to come up with adjusted vaccines should the need arise (i.e. an especially nasty mutant showing up). Shortening the detection from weeks (months?) to days is an order of magnitude gain in lead time. From a commercial point of view this agility will give BioNTech / Pfizer a lead over its competitors (assuming this is not public / shared information?)
What we hope for is rather informing the public policy. Now, any time a scary looking variant is sequenced, there is some public commotion, uncertainty about the future repercussions. We want to be able to gauge the appropriate level of concern in these situations. It is not an all-knowing oracle, but rather a way to distill the insights from prior observations and simulations, the same way a human expert would do. This being said, we believe that the insights provided by EWS can be of use in designing new vaccines, as well as deploying the existing ones most effectively.
Dear Marcin - congratulations on the manuscript!

Do you think the immune escape parameters would need to be retuned in a post Omicron world? Do you need the actual epitopes recognised by antibodies, or can you guess this from structure.

Do you capture any aspects with respect to changes in spike glycosylation in your models?

Finally, as with another reply, do you have a guess about the specificity of this system? Is it good enough to get production of vaccines going on variants that are flagged, just in case?

The system is constantly learning, so it "retunes" itself. We can infer epitopes from structure, but we found that data derived from known complexes is sufficient for our purpose.

Current version of EWS does not explicitly account for glycosylation. It is implicitly handled by the ML models, though.

For specificity, it is difficult to estimate. We know that for each of the named Variants of Concern, the signal from EWS was unmissable. Considering, that any new vaccine would need to go through a stringent approval process, I don't think that EWS should be a major determining factor in the process. However, it can certainly help resolve the doubts.

Thanks for the info! That ability to retune sounds excellent. I wonder if you could spin this out to have an EWS for influenzas too (although hopefully it’ll never get as severe!)

I guess for the glycosylation when you say it’s implicitly handled, there’s a N-linked sequon pattern somewhere in the language model, which I guess covers a good deal of info :)

Jury still out on effects of O-linked glyco on spike, but give me a shout if you’re interested in it!

Anyway, cool work, and all the best in where you’re taking the work next!

Wow, amazing to see you here. I have somewhat of a bioinformatics background, and this is the sort of work that I always found very interesting - do you think your team will produce some sort of high level architectural and process breakdown at some point?
I am sure we will, in due time. If you have any particular hopes or wishes, we will try to accommodate as much as we can.
Thanks for adding this information. I was unfair to judge so harshly and be so sceptical.

So the $1T question is ... what (if anything) is this model currently predicting about the next variant?

The method, in its current incarnation, is detecting dangerous variants and not foreseeing them. We can use the same approach to forecast plausible developments, but there are so many latent variables (intrapatient evolution, mobility, vaccination status, restriction compliance), that it is more of an informed "What If?" exercise, than a true prediction. Same as with many good stories on predicting the future, we can only see what is likely to happen, not what will necessarily happen... Out of many paths only a few will be explored in the end :-)