The new research, which involved the NHS’s Breast Cancer Screening Program (NHSBSP), found the AI algorithm even performed marginally better than medical professionals in diagnosing breast cancer in 120 mammogram exams.
Scientists hope further similar research will soon see AI become incorporated in breast cancer screening to assist doctors in diagnoses.
Mammograms should ideally be read by two readers to avoid false positive diagnoses, but shortages of radiologists make this difficult to achieve.
The British research project, published in the Radiological Society of North America (RSNA) journal Radiology, pitted a commercially available AI algorithm against human readers from the NHS in interpreting mammograms.
The screening method uses a mammographer to take several X-rays of each breast to check for signs of breast cancer that are too small to see or feel.
Under the NHS Breast Screening Programme (NHSBSP), women usually receive their first invitation for mammographic screening between the ages of 50 and 53.
They are then invited back for continual tests every three years up to the age of 70.
However, the screening does not detect every breast cancer, and false-positive interpretations of tests can lead to women undergoing unnecessary imaging and even biopsies.
One method that can improve both the sensitivity and specificity of screening is for each mammogram to be read by two separate readers.
According to the study’s researchers, from the University of Nottingham, double readings increase cancer detection rates by between six and 15 percent, whilst also keeping recall rates low.
But this type of strategy is labor-intensive and increasingly difficult to achieve with shortages in readers all across the world.
The research team, led by Professor Yan Chen, used tests from the Personal Performance in Mammographic Screening (PERFORMS) quality assurance assessment used by the NHSBSP to compare the human readers with an AI algorithm.
Each PERFORMS test consists of 60 challenging mammogram exams from the NHSBSP with benign, normal and abnormal findings.
For each test mammogram, the NHS reader’s score was then compared to the AI results.
The research team used data from two consecutive PERFORMS test sets – or 120 screening mammograms – and used the same two sets to evaluate the performance of the AI algorithm.
When they compared the AI test scores with those of the 552 human readers – which included 315 board-certified radiologists, 206 radiographers and 31 breast clinicians – the researchers found little to no difference in performance.
Human reader performance demonstrated a mean of 90 percent sensitivity and 76 percent specificity, whereas AI scored one percentage point higher in each category (91 percent sensitivity, 77 percent specificity).
“There is a lot of pressure to deploy AI quickly to solve these problems, but we need to get it right to protect women’s health,” Prof. Chen explained.
“The 552 readers in our study represent 68 percent of readers in the NHSBSP, so this provides a robust performance comparison between human readers and AI.
“The results of this study provide strong supporting evidence that AI for breast cancer screening can perform as well as human readers.
“It’s really important that human readers working in breast cancer screening demonstrate satisfactory performance, and the same will be true for AI once it enters clinical practice.”
Prof. Chen did warn, though, that further research was needed before AI was introduced as a second reader in clinical breast cancer screenings, adding that performance can drift over time and algorithms can be affected by changes in the operating environment.
“I think it is too early to say precisely how we will ultimately use AI in breast screening,” she admitted.
“The large prospective clinical trials that are ongoing will tell us more.
“But no matter how we use AI, the ability to provide ongoing performance monitoring will be crucial to its success.
“It’s vital that imaging centers have a process in place to provide ongoing monitoring of AI once it becomes part of clinical practice.
“There are no other studies to date that have compared such a large number of human reader performance in routine quality assurance test sets to AI, so this study may provide a model for assessing AI performance in a real-world setting.”
Produced in association with SWNS Talker