|
|
|
|
|
by maximeago
1557 days ago
|
|
Here is how it would work in theory (not including the scalability question of working with heavy DICOM files and huge DNN). I'm assuming your data is made of records composed by an image and some information about the image or the patient. The system will generate a fake dataset with the exact same structure and schema (the information on patients is realistic, the images look reasonable and importantly has the right encoding, size, etc.). The purpose of this fake data is for the vendor to adjust their algorithm to be able to consume your data as it is. The vendor builds up the preprocessing on the fake data and then submit their data job to the API (say a preprocessing function to be applied on each record and a Tensorflow model to be fitted on the data, or just to measure the performance on the data). The preprocessing code runs on the original records, the model would be trained or validated against the real data. In the end they can prove the value of their model without having to get their hands on the real data. |
|
I've not encountered differential privacy in my work before now, but at least for dealing with metadata in the DICOM it could probably be helpful for some datasets. But it could still be challenging to ensure the IODs are correct (or that known quirks are preserved). Anyway this is very interesting. I have a colleague who is working on some utilization/value research using billing records and I'll show him this.