The development of artificial intelligence (AI) for healthcare applications recently reached a major milestone in the form of the first FDA approval for an artificial intelligence algorithm for clinical use in January of this year. Two studies published within the past 11 months have shown that existing artificial intelligence algorithms can be applied to the detection of retinal disease with the ability to yield physician level accuracy.¹,² The algorithms developed as part of these studies were designed to autonomously identify diabetic retinopathy, diabetic macular edema, and age-related macular degeneration.
In both studies, deep convolutional neural networks (DCNN’s) were used to allow a computer to self-learn the features that are characteristic of the previously mentioned diseases from fundus images. Researchers are working to validate this new technology with the hope that it can help alleviate the lack of access to medical eye care in many parts of the world. This technology is also seen as a potential tool that can be used in mass screening programs to help diagnose chronic retinal disease at early stages.² This would allow affected patients to receive earlier treatment and avoid vision loss that would otherwise occur if they were diagnosed at later stages.²
AI Detecting Diabetic Retinopathy and DME
A team of researchers working with Google recently published the results of a study titled Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. The study was published in December of 2016 in JAMA. The objective of the study was to test the accuracy of an artificial intelligence algorithm in detecting diabetic retinopathy and diabetic macular edema from fundus images.
The algorithm used in the study was a modified version of the GoogleNet neural network which was developed at University College London in collaboration with Google in 2015.³,5 The algorithm was trained with a retrospective development data set consisting of 128,175 images. Each of the images was graded three to seven times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology, senior residents.
The performance of the algorithm was assessed after it’s analysis of two validation data sets, EyePACS-1 and Messidor-2. The EyePACS-1 data set consisted of 9963 images from 4997 patients. The Messidor-2 dataset had 1748 images from 874 patients. The two sets were graded by a group of US board-certified ophthalmologists with the highest rate of self-consistency (eight Ophthalmologists graded EyePACS-1 and seven Ophthalmologists graded Messidor-2). A simple majority decision (an image was classified as referable if ≥50% of ophthalmologists graded it referable) served as the reference standard for both referability and gradability. The graders were masked to judgments by other graders.
With an algorithm configuration that was optimized for high sensitivity in the development set, the algorithm was able to achieve sensitivity of 97.5% and specificity of 93.4%. With a configuration that was optimized for high specificity, the algorithm was able to achieve sensitivity of 90.3% (95% CI, 87.5%-92.7%) and specificity of 98.1% (95% CI, 97.8%-98.5%). For detecting referable diabetic retinopathy, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993).
With an algorithm configuration that was optimized for high sensitivity in the development set, the algorithm was able to achieve sensitivity of 96.1% and specificity of 93.9%. With a configuration that was optimized for high specificity, the algorithm was able to achieve sensitivity of 87.0% (95% CI, 81.1%-91.0%) and specificity of 98.5% (95% CI, 97.7%-99.1%). For detecting referable diabetic retinopathy, the algorithm had an area under the receiver operating curve of 0.990 (95% CI, 0.986-0.995).
AI Detecting ARMD
A team of researchers from the Wilmer Eye Institute at Johns Hopkins University recently published the results of study titled Automated Grading of Age-Related Macular Degeneration From Color Fundus Images Using Deep Convolutional Neural Networks. The study was published in September of 2017 in JAMA Ophthalmology. The objective of the study was to leverage recent artificial intelligence advances in order to explore a novel application of DCNN’s to identify ARMD in fundus images. More specifically, the study aimed to solve a 2-class ARMD classification problem. The first part involved identifying fundus images of individuals who have either no or early ARMD vs. cases where individuals have intermediate or advanced ARMD.
The algorithm used in this study was a modified version of the AlexNet DCNN model which was originally developed at the University of Toronto in 2012.³,5 The AREDS data set was used for both the training and validation of this algorithm (with several partitioning methods) due to the lack of additional large sets of available ARMD fundus images. The data set contained 130,000 patient photos that were collected over a period of 12 years. In order to compare the performance of the algorithm to that of a human physician, researchers recruited an Ophthalmologist who graded 5,000 fundus photos from the AREDS. The performance of both the algorithm and Ophthalmologist were compared to the classifications given to the photos during the AREDS study, which were used as the “gold standard” for classification.
The algorithm yielded accuracy that ranged between 88.4% (SD, 0.5%) and 91.6% (SD, 0.1%) depending on which partitioned data subset was being analyzed. The area under the receiver operating characteristic curve was between 0.94 and 0.96, and κ (SD) between 0.764 (0.010) and 0.829 (0.003), which indicated a substantial agreement with the gold standard Age-Related Eye Disease Study data set. In contrast the accuracy of the human grader ranged from 90.2% – 91.6% and κ (SD) between 0.800 and 0.829 (0.003). Below, you can view more detailed results from the image set that was created from the baseline partitioning method (termed standard partitioning). In this data set images taken at each patient visit (approximately taken every 2 years) were considered unique.
Algorithm Results for Standard Partitioning Image Set
Accuracy: 90.0% (SD, 0.6%) – 91.6% (SD,0.1%)
Sensitivity: 85.7% (SD, 2.3%) – 88.4% (SD, 0.7%)
Specificity: 91.8% (SD, 0.3%) – 94.1% (SD, 0.6%)
Positive Predictive Value: 91.3% (SD, 1%) – 92.3% (SD, 0.7%)
Negative Predictive Value: 89.1% (SD, 1.4%) – 91.1% (SD, 0.4%)
K: 0.700 (.008) – 0.829 (0.003)
Human Physician Results for Images Partitioned via Standard Partitioning
Positive Predictive Value: 91.0
Negative Predictive Value: 89.6
These studies are an indication that we are now crossing the threshold of physician quality artificial intelligence in eyecare. Both of these studies benefited from the availability of recent breakthroughs in AI such as the utilization of DCNN’s and high performance graphics processing units (GPU’s).² Future eyecare applications of this technology will be heavily influenced by the progress made in the artificial intelligence field which is currently experiencing a boom in research within both the private and public sectors. The field is also attracting the top talent in computer science from across the globe.
We can expect additional research on AI applications in eyecare to be published in the near future due to the growth of the AI field as well as the relatively quick adaptability of general purpose AI algorithms to eyecare specific tasks. Another accelerator will be the fact that researchers have access to large datasets of information in the form of fundus images and OCT scans to work with due to the large utilization of these technologies in the clinical setting today. The timeline for clinical deployment of this technology will vary from country to country due to differences in regulations, but if the pace of the current progress in this field is any indication, it will likely occur in the near future.
- Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016;316(22):2402–2410. doi:10.1001/jama.2016.17216
- Burlina PM, Joshi N, Pekala M, Pacheco KD, Freund DE, Bressler NM. Automated Grading of Age-Related Macular Degeneration From Color Fundus Images Using Deep Convolutional Neural Networks. JAMA Ophthalmol. Published online September 28, 2017. doi:10.1001/jamaophthalmol.2017.3782
- Szegedy C, Vanhouke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. December 2015.
- Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks.
- Deshpande, Adit (2016, August 24). The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3)