if the issue is sensitive data in a training dataset, perhaps that should be addressed rather than accommodated.
Just say medical data is not open source and make the rest really open source
the problem is the source medical data itself is insufficiently cleansed. (if it can be at all.)
ideally the medical data is open source, but only contains what’s necessary, and not what’s sensitive.
that is, obviously, messy…
if the issue is sensitive data in a training dataset, perhaps that should be addressed rather than accommodated.