A key part of developing machine learning models for medical imaging is first selecting the relevant scans to train the model. And the number of scans available for developing models continues to grow. At Nines we've developed a more scalable approach to identifying the image type of a scan using modern deep learning algorithms.
The volume of created and transferred digital medical images continues to swell. According to a study published in JAMA¹, among older adults CT imaging rates were 428 per 1000 person-years in 2016 vs 204 per 1000 person-years in 2000 in US health care systems. One problem is that when it comes to organizing those images, the use of the DICOM standard is not so standard.
We in the academic and medical research community are grateful that some of this swell of large, appropriately anonymized image data is available as we build new service models. Many medical machine learning algorithms in academic literature were developed and measured using tailored, anonymized public datasets such as the Lung Image Database Consortium image collection (LIDC-IDRI) or ChestX-ray8 from the NIH.
At Nines we're building computer vision models that can analyze radiology images, and we use data responsibly approved and anonymized from select hospital providers. We curate diverse datasets in terms of institutions, scanner types, etc. to test the generalizability of our algorithms. But the benefits of this dataset diversity come with a downside: the data usually hasn't been curated as thoroughly as a public dataset or with machine learning in mind. For example, the process of anonymization can introduce inconsistent associations with DICOM files.
Developing a model requires selecting relevant scans yet this seemingly straightforward step actually is nontrivial. Why? A label such as image type often is not associated with the DICOM files. It must be inferred. And in practice, these fields often are not standardized across hospitals, manufacturers and years.
While developing NinesAI™, our CT head emergent triaging device, we needed to select the image type of axial non-contrast head CT scans. But if we were to develop a device to analyze lung nodules, we'd need to select chest CT scans with convolution kernels for lungs -- a different kind of image type. Ultimately, we need to associate an image type label with every image in our dataset of hundreds of thousands of scans. Yet the non-standardization mentioned earlier creates a downstream level of inefficient complexity.
We use a 3-part organization method that differs from historical approaches. Let's take a look.
Let’s look at an historical approach to identifying the image type of a scan and contrast that to Nines’ method. It’s useful to break down the procedures into three main steps: Data Understanding, Filter Construction, and Filter Application.
This is the process of identifying which scans are of the desired image type and which are not, given a subset of a large dataset.
This is the process of using the DICOM data in the previous step to build an algorithm that can identify the image type in scans.
It is important to note that while the binary classification CNN model is more reliable than the rule-based filters, it can still have some non-zero error rate. As such, we should only apply this automated method to training datasets which need to be scaled. Validation and testing datasets should be annotated manually by radiologists to ensure correctness.
While this method of using a binary classification CNN model was not used in the development of NinesAI, we plan to include this approach in current and future ML model development at Nines.
A key part of the medical imaging model development process is first selecting the relevant scans to use from a large dataset. This can be nontrivial because an image type label often is not directly associated with the DICOM files and must be inferred.
We've seen how Nines’ approach of using the images directly via a CNN model is a more simplified and scalable approach to inferring image type than the historical approach of a rule-based filter relying on the DICOM fields.
We developed this approach by working closely with Nines radiologists, especially during the Data Understanding step. At Nines we’re investigating new ways to assist radiology workflows, using modern computer vision image type classifiers similar to those discussed here.