What is the training data and have you thought about how to build a causal model that would be robust to interventions (e.g. by changing or adding a gene)?
This is a great point, in fact is the main purpose of the model. The model is validated against genomes with variations which were never seen to check its prediction capabilities.
AFAIK, variants are created either by changing/adding/shuffling already known genes. And not trying to create an unknown mutation of a gene we never saw before. Therefore, our thesis is that with enough information we can capture the effect of the gene in the final product.
Of course, there is epigenetics as well; which is tackled with additional information about the environment.
This is a great point, in fact is the main purpose of the model. The model is validated against genomes with variations which were never seen to check its prediction capabilities.
AFAIK, variants are created either by changing/adding/shuffling already known genes. And not trying to create an unknown mutation of a gene we never saw before. Therefore, our thesis is that with enough information we can capture the effect of the gene in the final product.
Of course, there is epigenetics as well; which is tackled with additional information about the environment.