|
|
|
|
|
by leelin
2600 days ago
|
|
Maybe a dumb question from a non-medical guy: are medical images considered "stationary" from a stats viewpoint? That is, will medical images of diseases we diagnose in the next 20 years look a lot like the ones from the past 20 years, or is there a danger of over-fitting on an evolving data set? Could either the technology or the biology of the disease evolve? In a prior life I was a quant trader, and financial market data is notorious for having the non-stationary problem. On top of market rules and structures changing all the time, once someone discovers a profitable trading idea, their own actions change what the data looks like for everyone else from that point forward. |
|
Example #1: Let's say that cancer rates are increasing over time and cameras are improving over time. You might end up with a weird artifact in your model that higher resolution images are more likely to indicate cancer.
Example #2: Let's say that cancer-detecting algorithms are widely successful and so someone makes an app that lets you upload images of skin and the app tells you the probability of you having cancer. Suddenly a model that was trained on suspicious lesions is being used on normal freckles that people uploaded for fun. You end up with a lot of false positives. Maybe you try to combat that by including images uploaded to the app (that you somehow obtain labels for). But now you have a model that predicts that photos taken in brightly lit medical offices are likely to be cancer and blurry images taken in bathroom mirrors are not cancer.
You could argue that Example #2 is more about the difference between training data and data to be scored, but the fact remains that outside of tightly controlled scenarios, the way data is collected nearly always changes in time and ends up affecting model performance in unexpected ways.