i-RadioDiagno: Human-centered AI Medical Imaging Diagnosis Tool

Research Project Ongoing
PI: Ying Ding, School of Information/Dell Medical School, University of Texas at Austin
Co-PI: Nick Bryan, Department of Radiology, Dell Medical School, University of Texas at Austin

Abstract : The shortage of radiologists and burnout of physicians create the urgent demand for immediate solutions. In this project, we will build the open source human-centered medical imaging diagnosis tool (called i-RadioDiagno) which adds the prior knowledge of radiologists (e.g., radiomics) into the deep learning algorithms to enable automatic or semi-automatic generation of diagnosis notes based on medical images. i-RadioDiagno will be integrated and tested in the clinical practice to enable radiologists and machines to work together by providing feedback loops to improve the accuracy and adaptive learning. It will be built on Amazon SageMaker and Apache MXNet on AWS.

Keywords : medical imaging, human-centered, radiomics, knowledge of radiologist, diagnosis notes

Introduction

Medical Imaging

Artificial intelligence (AI) is revolutionizing healthcare and, in particular, medical imaging. Health innovations applying machine learning (ML) and deep learning (DL) in radiology account for more than half of the total AI innovations in health. Advancing AI in medical imaging brings extraordinary benefits with better accuracy, lower cost, and higher efficiency. Machine Learning (ML) is a part of AI wherein machines derive patterns from data without being explicitly programmed. Among ML algorithms, DL imitates the human brain neural networks to generate representations based on one up to thousand layers of artificial neural networks (ANNs). Derived from ANNs, convolutional neural networks (CNNs) have been widely applied in computer vision. The convolution operators can generate feature maps based on the intensities of each pixel/voxel to form kernels. Different kernels can be used to capture the features of blurriness, texture, sharpness, and more. But the current DL on medical imaging has several issues, such as overfitting and demanding labeled training datasets. ML and DL’s use of training data sets means they raise important questions the privacy of sensitive information when applied in health. DL methods also operate as a black box and lack of interpretation for outcomes becomes the major bottleneck.

Prior Knowledge of Radiologists

Radiologists look at medical images from three dimensions: mass, energy, and time. In medical imaging, the measurements of mass and energy are called signals, and an image can be defined as a rendering of spatially and temporally defined signal measurements. Images include explicit spatial information which is intrinsic and critical to understanding the patient and his/her disease. Radiologists use pattern recognition on medical images to make a diagnostic decision (Bryan, 2010). Pattern recognition includes pattern learning, which physicians gained during their education and training, and pattern matching, which physicians apply learned patterns to match to the unknown images to detect abnormality. Signal can be measured by the relative brightness of pixels or voxels. The signal intensities (SIs) of different tissues and spatial features including number, size, shape and anatomic location are important prior knowledge of radiologists. For a fully trained radiologist, most pattern matching tasks only take less than a second. The knowledge of radiologists can be captured by machines or extracted from medical images using algorithms. This fast-growing field is called radiomics

Radiomics

Radiomics contains features extracted from medical images using computing algorithms, most of these features are related to the prior knowledge of radiologists. Different from intrinsic knowledge of radiologists, radiomics features are quantitative and at the large scale (sometimes can reach to over thousands of features). We can view radiomics as the quantified prior knowledge of radiologists. Radiomics plays an important role in precision medicine to find associations among quantitative information extracted from clinical images to support evidence-based clinical decision making. Different kinds of features can be derived from clinical images including lesion shape, voxel intensity histogram, and spatial arrangement of the intensity values at voxel level (texture). There are five groups of radiomic features: size and shape-based features, descriptors of the image intensity histogram, descriptors of the relationships between image voxels (e.g., gray-level co-occurrence matrix (GLCM), run length matrix (RLM), size zone matrix (SZM), and neighborhood gray tone difference matrix (NGTDM)) derived textures, textures extracted from filtered images, and fractal features (Parmar, 2015). They can be extracted either directly from the images or after applying different filters or transforms (e.g., wavelet transform). Radiomics can generate the detailed quantification of tumor phenotype (Nicolasjilwan, 2014). Studies report the association between radiographic imaging phenotypes and tumor stage, metabolism, and gene or protein expression profiles (Ganeshan, et al., 2010, 2013).

Goal

The shortage of radiologists and burnout of physicians create the urgent demand for immediate solutions. A radiologist reads about 20,000 images a year, roughly 50-100 per day and the number is increasing. US each year produces 600 billion images and 31% of American radiologists have experienced at least one malpractice claim, often missed diagnoses (Topol, 2019). Building the automatic or semi-automatic approaches on medical imaging to streamline the process from medical images to diagnosis notes becomes the unavoidable next step. In image detection, the significant difference between human and machine is in classifying vs. quantifying image features. Humans tend to categorically classify image features, while machines are adept at numeric calculation of features. Each approach has different strengths and weakness, but the combination of the two could offer the better solutions. Integrating machine and human medical image detection is a novel way to improve the accuracy of AI algorithms, and possibly enables interpretable AI by using prior knowledge of radiologists. Furthermore, besides the “black box” features of AI algorithms, radiologists can incorporate non-image data from patients’ medical records, and social determinants data into the diagnosis processes. The prior knowledge of radiologists, radiomics in general, has not been implemented in the layers of the neural networks, nor anywhere in the deep learning workflow from medical images to diagnosis notes (Liu, et al., 2019). In this project, we will build the open source human-centered medical imaging diagnosis tool (called i-RadioDiagno) which adds the prior knowledge of radiologists (e.g., radiomics) into the deep learning algorithms to enable automatic or semi-automatic generation of diagnosis notes based on medial images. To integrate i-RadioDiagno into the clinical decision support system, this tool will enable radiologists and machines to work together by providing feedback loops to improve the accuracy and adaptive learning (see Figure 1). Integrating DL methods into the workflow of the clinical decision support system is critical. Oftentimes, smart ML algorithms for clinical use turns out to be practically useless for clinicians. This project aims to engage the end-user (clinicians) in the process of developing the DL algorithms. It is important to include the feedback loop in the design process and incorporate the integration with the clinical decision support system from day one. Working closely with the radiology department and clinical practice at Dell Medical School at University of Texas at Austin ensures the success of this project.

Prior Art

Medical imaging is now routinely used to predict treatment response for cancer treatment (Nasief, et al., 2019). Radiomics extracts quantitative data from medical images to represent tumor phenotype, such as, spatial heterogeneity of a tumor and spatial response variations. Eilaghi et al. (2017) studied that CT texture features of the dissimilarity are associated with overall survival of pancreatic cancer. Chen et al. (2017) showed that the first-order radiomic features (e.g., mean, skewness, and kurtosis) are correlated with pathological responses to cancer treatment. Radiomic studies have demonstrated the effectiveness in image-based biomarkers for cancer staging and prognostication. Huang et al. (2018) showed that radiomics can increase the positive predictive value and reduce the false-positive rate in lung cancer screening for small nodules compared with human reading by thoracic radiologists. Zhang et al. (2017) found that multiparametric MRI-based radiomics nomograms provided improved prognostic ability in advanced nasopharyngeal carcinoma (NPC). Metastatic tumors are common in the later stages of cancers. Radiomics become important for diagnosing distant metastases which quantify tumor phenotype using a large number of image features by converting image data into higher dimensional space for subsequent mining (Zhou et al., 2017). Multiparametric radiomics which extracts radiomics features from the set of longitudinal images in high dimensional multiparametric imaging space has shown the promising results on classifying malignant from benign breast lesions with high sensitivity and specificity. Some radiomic features are related to specific scanners, acquisition settings, and image processing, such as signal intensity, spatial resolution (Rizzo, et al., 2018).

The outputs of diagnoses by radiologists based on the reading of medical images are free-text reports. Automatically generating these reports can ease the clinical workflow, reduce the burnout of radiologists and medical errors, and improve quality of care. The related works employ latest ML/DL methods from computer vision to natural language processing (NLP) to generate medical reports, but not focus on the nuance of radiology (Krause, et al., 2017). Radiomic researches are centered around disease prediction, but not radiology report generation. DL/ML researches related to medical report generation are more on the NLP side (Zhang, et al., 2018). The bridge to connect medical images with NLP to generate radiology report is missing. The traditional image captioning approaches can only generate shorter phrases which are far from the complexity required by radiology reports (Lu, et al., 2017). Recently, Liu et al. (2019) developed an automatic chest x-ray radiology report generation system and proposed the clinically coherent reward to fine-tune the generated reports to ensure readability and clinical accuracy. They evaluated their methods on Open-I and MIMIC-CXR datasets and demonstrated the improvement on natural language generation and clinical accuracy over several baselines. This is by far the most relevant research compared with other automatic radiology report generation approaches (Han, et al., 2018; Hsu, et al, 2018). But Liu’s research did not add the prior knowledge of radiologist into their deep learning model and did not incorporate their model into the clinical setting by providing the feedback loop from clinicians. In this project, we will develop the open-source tool (i-RadioDiagno) to improve Liu’s method by adding the prior knowledge of radiologist and integrating the model into the clinical decision support system by providing the feedback loop and engaging radiologists from day one. i-RadioDiagno will transform the medical imaging diagnosis practice and build human-centered AI tools to enable radiologists and machines to work together to deliver personalized care.

Methods

In this project, we will develop the open source tool called i-RadioDiagno which has the following components: 1) radiomic feature extraction; 2) adding to the Liu’s model as the prior knowledge of radiologists; and 3) feedback loop through clinical practice.

Picture1

Figure 1. Adding prior knowledge of radiologists (the model is based on Liu et al., 2019)

Tools to extract radiomic features

Many tools are available to extract radiomic features, such as NiftyNet (an open source convolutional neural networks platform for medical image analysis and image-guided therapy), DLTK (state of the art reference implementations for deep learning on medical images), DeepMedic, U-Net (convolutional networks for biomedical image segmentation), v-net, and SegNet (a deep convolutional encoderdecoder architecture for robust semantic pixel-wise labelling) (Lundervold and Lundervold, 2019). Some of above tools are only used for semantic segmentations and hard to get implemented on the Amazon MXNET framework. Pyradiomics is promising and can be smoothly integrated with Amazon Sagemaker built-in algorithms, e.g. image classification, semantic segmentation, seq2seq modeling. Pyradiomics (https://pyradiomics.readthedocs.io/) is an open source python package to extract radiomics features from medical images (Griethuysen, et al., 2017). It is funded by National Cancer Institute and provides python software packages to extract radiomic features including first order features, shape features, Gray level Co-occurrence matrix (GLCM) features, Gray Level Size Zone Matrix (GLSZM) features, Gray Level Run Length Matrix (GLRLM) features, Neighboring Gray Tone Difference Matrix (NGTDM) features, and Gray Level Dependency Matrix (GLDM) features.

Image to report model

The image to report model will be built based on Liu’s model (see Figure 1) with the contribution of adding the prior knowledge of radiologists (e.g., radiomic features) to Liu’s model. Liu et al. (2019) introduced the hierarchical generation model with the CNN-RNN-RNN architecture, and used reinforcement learning to improve the clinical accuracy of the generated reports. The image encoder CNN is to obtain spatial image features as an embedding of dimensionality d, which is identical to the word embedding dimension in the later RNN part. The extracted radiomic features will be normalized and embedded into d dimensionality based on the recent research output from PI’s lab (Wanyan, et al., under review) which developed a new embedding based on a multi-filtering graph convolution neural network (called MF-GCN, see Figure 2). MF-GCN is based on the graph model which can be adapted to non-graph model. We can use the attention model to learn the weights of the importance for different categories of radiomic features. The vectors of radiomic features will be aggregated and the local Graph Convolution Neural Network (GCN) will be applied to learn the hidden personalized roles for the aggregated vector until it converges (see Figure 2, left). It is important to consider these different roles for the future clustering and classification tasks. The sentence decoder RNN adopts Long-Short Term Memory (LSTM) and models the hidden state as h_i,m_i=LSTM(v ̅; h_(i-1,) m_(i-1)), where h_(i-1,) m_(i-1) are the hidden state vector and the memory vector for the previous sentence r. From the hidden state h_i, the topic vector τ_i can be obtained. Word decoder RNN will then decode the words given the topic vector τ_i by sampling next words based on the probability. Reinforcement learning will be implemented for report readability based on the reward function derived from the clinically coherent reward metrics with the emphasis on positive, negative, uncertain and absent mention of disease states. In this project, we will use pyradiomics to extract features, and use Amazon Sagemaker built-in algorithms and Apache MXNet at AWS for computer vision and NLP (e.g., GluonCV and GluonNLP) to develop the model and, compare it with Liu’s CNN-RNN-RNN model to seek better performance.

Figure 2: Normalizing feature embeddings using MF-GCN (Wanyan et al., under review) which can be adapted to normalize the radiomic features. The right figure shows the normalized features using MF-GCN have better clustering effects than other baseline methods.

Feedback loop through clinical practice

The generated reports will be examined by radiologists and their feedbacks will be captured using user-friendly tools/FHIR apps which will be developed using AWS Amplify plus AppSync which can provide real-time update function, and support one development environment for both web and mobile implementations. These feedbacks will go back to the workflow of the model to improve the radiomic feature extraction and embedding (e.g., adding more weights to features that matter most to radiologists or are more related to patient medial history). Co-PI has extensive experience in developing such tools to facilitate radiology diagnosis and can capture the feedbacks from radiologists (Haight, et al, 2018). PI is also working closely UX designers from School of Information at UT Austin which is the one of the top ranked ischools and design schools in the country. PI’s lab (Chen, et al., 2019) has developed a user interface/prototype called info button to provide extra medical information from literature and related websites to support evidence-based care during the diagnosis procedure of clinicians. Co-PI will help to test i-RadioDiagno in the clinical settings at Dell Medical School at UT Austin and integrates it into the mentoring program with residents and fellows from the radiology department. We also collaborate closely with Dr. Jon Tamir, faculty at Electronic Computing Engineering Department at UT Austin. Dr. Tamir has developed novel scanning techniques using deep learning image reconstruction which will be tested at the MusculoSkeletal (MSK) clinic to treat musculoskeletal joint pain. This proposal is also closely related to Dr. Jon Tamir’s proposal about “AI-driven magnetic resonance imaging for same-day point-of-care imaging and diagnosis”. Jon’s proposal focuses on the first phase from MRI scan signals to images, our proposal focuses on the second phase from images to diagnosis notes and building the feedback loops between radiologists and algorithms. Together, these two proposals generate the great ecological cycle of bringing algorithms to bedsides and then feedbacking from bedsides to algorithms. One of the important outputs of this project is building the benchmark for medical imaging to establish healthcare providers' trust in the AI/ML algorithms and promote transparent evaluation.

Expected Results

i-RadioDiagno is an open source tool built on Amazon SageMaker Platform that integrates the existing radiomic feature extraction tools, and embeds it into the workflow of clinical decision support system (CDSS) to facilitate diagnosis process of radiologists. The user-friendly interface of i-RadioDiagno will allow radiologists to view the radiomic features and calculation in a visual manner, and their diagnosis procedure will become just several mouse clicks based on the pick lists generated by i-RadioDiagno and the diagnosis notes can be created semi-automatically. By using i-RadioDiagno, all the information or footsteps during the diagnosis process of radiologists are captured, stored, and turned into the high-quality labels for medical images which are critical for the AI powered medical imaging diagnosis.

Outputs

We aim to publish 2-3 conference or journal articles regarding this project. The targeted venues are AMIA Annual Conference, AMIA Informatics Summit, International Conference on Artificial Intelligence in Medicine, AAAI conference, Journal of the American College of Radiology, BMC Bioinformatics, PLoS journals, Scientific Reports, Scientific Data, and others. We will present the progress of the projects as workshop papers or posters in related venues. Both PI and Co-PI are active speakers in different conferences or events. We will promote this project widely.

References

Bryan, R. N. (2010). Introduction to the Science of Medical Imaging: Cambridge University Press; 2010

 

Eilaghi, A. et al. (2017). CT texture features are associated with overall survival in pancreatic ductal adenocarcinoma – a quantitative analysis. BMC Med. Imaging 17, 38.

 

Chen, X. et al. (2017). Assessment of treatment response during chemoradiation therapy for pancreatic cancer based on quantitative radiomic analysis of daily CTs: an exploratory study. PLOS ONE 12, e017896

 

Chen, W., Sun, P., Yang, J., Ding, Y, & Rousseau, J. (2019). Building an evidence-based AI clinical decision support system with MIMIC open data and user-centered research process. Good System Symposium, UT Austin, Austin Texas, Oct 7, 2019

 

Ganeshan, B., Abaleke, S., Young, R. C., Chatwin, C. R. & Miles, K. A. (2010). Texture analysis of non-small cell lung cancer on unenhanced computed tomography: initial evidence for a relationship with tumor glucose metabolism and stage. Cancer Imaging, 10, 137.

 

Ganeshan, B. et al. (2013). Non–small cell lung cancer: histopathologic correlates for texture parameters at CT. Radiology 266, 326–336.

 

Griethuysen, J. J. M., Fedorov, A., Parmar, C., Hosny, A., Aucoin, N., Narayan, V., Beets-Tan, R. G. H., Fillon-Robin, J. C., Pieper, S., Aerts, H. J. W. L. (2017). Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Research, 77(21), e104–e107.

 

Haight, T, Bryan, R.N., Meirelles, O., Tracy, R., Fornage, M., Richard, M., et al. (2018). Associations of plasma clusterin and Alzheimer’s disease-related MRI markers in adults at mid-life: The CARDIA Brain MRI substudy. PLoS ONE, 13(1): e0190478

 

Han, Z., Wei, B., Leung, S., Chung, J., & Li. S. (2018). Towards automatic report generation in spine radiology using weakly supervised framework. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 185–193. Springer.

 

Hsu, T.H., Weng, W. H., Boag, W., McDermott, M., & Szolovits, P. (2018). Unsupervised multimodal representation learning across medical images and reports. arXiv preprint arXiv:1811.08615.

 

Huang, et al. (2018). Added value of computer-aided CT image features for early lung cancer diagnosis with small pulmonary nodules: A matched case-control study. Radiology, 286(1), 286-295.

 

Krause, J., Johnson, J., Krishna, R., & Li, F. (2017). A hierarchical approach for generating descriptive image paragraphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 317–325.

 

Liu, G., Hsu, T., McDermott, M., Boag, W., Weng, W., Szolovits, P., & Ghassemi, M. (2019). Clinically accurate chest x-ray report generation. https://arxiv.org/abs/1904.02633

 

Lu, J., Xiong, C., Parikh, D., & Socher, R. (2017). Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 375–383.

 

Lundervold, A. S., & Lundervold, A. (2019). An overview of deep learning in medical imaging focusing on MRI. Zeitschrift fur Medizinische Phyisk, 29(2), 102-127.

 

Nasief, H., Zheng, C., Schott, D., Hall, W., Tsai, S., Erikson, B., & Allen L., (2019). A machine learning based delta-radiomics process for early prediction of treatment response of pancreatic cancer. NPJ Precis Oncol, 3, 25. doi: 10.1038/s41698-019-0096-z.

 

Nicolasjilwan, M. et al. (2014). Addition of MR imaging features and genetic biomarkers strengthens glioblastoma survival prediction in TCGA patients. J. Neuroradiol. doi: 10.1016/j.neurad.2014.02.006.

 

Parekh, V. S., & Jacobs, M. A. (2019). Deep learning and radiomics in precision medicine. Expert Rev Precis Med Drug Dev, 4(2), 59-72.

 

Parmar, C., et al. (2015). Radiomic feature clusters and prognostic signatures specific for lung and head & neck cancer. Scientific Reports, 5:11044.

 

Rizzo, S., et al. (2018). Radiomics: the facts and the challenges of image analysis. European Radiology Experimental, 2: 36. https://doi.org/10.1186/s41747-018-0068-z

 

Topol, E. (2019). Deep Medicine. Basic Books.

 

Wanyan, T., Zhang, C., Azad, A., Liang, X., Li, D., & Ding, Y. (under review). Deep network embedding through multi-filtering GCN. International Joint Conference on Artificial Intelligence (IJCAI 2020).

 

Yuan, R., Shi, S., Chen, J., & Cheng, G. (2018). Radiomics in RayPlus: a Web-based tool for texture analysis in medical images. Journal of Digital Imaging, 32, 269-275.

 

Zhang, L. et al. (2015). IBEX: an open infrastructure software platform to facilitate collaborative work in radiomics. Med. Phys. 42, 1341–1353.

 

Zhang, Y., Ding, D. Y., Qian, T., Manning, C.D., & Langlotz, C.P. (2018). Learning to summarize radiology findings. arXiv preprint arXiv:1809.04698.

 

Zhou, H., et al. (2017). Diagnosis of distant metastasis of lung cancer: Based on clinical and radiomic features. Translational Oncology, 11(1), 31-36.