Hacker News new | ask | show | jobs
by rstevens24 2170 days ago
Great post! Machine learning definitely has a lot of potential to assist in medical diagnostics, and with all the training data coming out, it's a field ripe for innovation.

I work at Innolitics, and we do a lot of work with machine learning in the medical imaging space. We've honed in on a set of tools that works well for us; I thought it might be worth sharing in case anyone else is wanting to explore this space in light of COVID.

The referenced UC San Diego dataset has its images stored as PNGs, but if anyone is interested in doing more ML work with medical images, you'll probably find most of them in the DICOM file format. I can highly recommend using the dicom-numpy library for easy conversion of DICOM files into numpy arrays: https://github.com/innolitics/dicom-numpy. For more general example datasets saved in the DICOM format, The Cancer Imaging Archive is always an excellent resource: https://www.cancerimagingarchive.net/collections/

Another advantage of using DICOM files is that there's lots of metadata you can extract from each file to train on a wider clinical context. The PyDicom library makes that very straightforward: https://github.com/pydicom/pydicom

The Python + PyDicom + Keras or PyTorch stack is really powerful and easy to get started with. We use it at Innolitics frequently and put together some tutorial articles to demonstrate how to get started: https://innolitics.com/articles/ct-slice-localizer/

I'm excited to see more projects like this! More data and improved tools are only going to improve our ability to gain new insights into COVID.

1 comments

What advantage does dicom-numpy offer? I've mostly developed with pydicom for my medical imaging pipeline as it allows me to retain important dicom information (pixel spacing, etc). In fact, the 'PixelArray' attribute returns a numpy matrix that I can then use.
dicom-numpy's biggest advantage is that it combines individual slices into a single 3D numpy volume. This makes it really easy to immediately jump in to performing operations at the volume level rather than the slice level. It also performs some sanity checks for you, such as checking for missing slices or uneven slice spacing.

For me, I've also found dicom-numpy useful for returning the ijk-to-xyz affine transformation matrix, which describes how the voxels are oriented in patient coordinate space. dicom-numpy builds on top of PyDicom, so they are definitely not mutually exclusive! We use them both extensively.

Does it handle cine images easily?