When diagnosing skin conditions based solely on images of a patient’s skin, doctors are not as effective when the patient has darker skin, according to a new study by MIT researchers.
The study, which included more than 1,000 dermatologists and general practitioners, found that dermatologists accurately characterized about 38% of the images they saw, but only 34% of those showing darker skin. GPs, who were less accurate overall, showed a similar decrease in accuracy with darker skin.
The research team also found that using an artificial intelligence algorithm could improve doctors’ accuracy, although these improvements would be greater when diagnosing lighter-skinned patients.
Although this is the first study demonstrating physician diagnostic disparities based on skin tone, other studies have found that images used in dermatology textbooks and training materials primarily feature lighter skin tones. According to the MIT team, this could be a contributing factor to this discrepancy, as well as the possibility that some doctors have less experience treating patients with darker skin.
“No doctor probably intends to do worse on any type of person, but you may not have all the knowledge and experience, and so on certain groups of people you might do worse,” says Matt. Groh PhD ’23, assistant professor at the Kellogg School of Management at Northwestern University. “This is one of those situations where you need empirical evidence to help people understand how you might want to change policies around dermatology education.”
Groh is the lead author of the study, which appears today in Natural medicine. Rosalind Picard, professor of media arts and sciences at MIT, is the lead author of the work. paper.
Several years ago, a MIT study led by Joy Buolamwini PhD ’22, found that facial analysis programs had much higher error rates when predicting the gender of darker-skinned people. This discovery inspired Groh, who studies human-AI collaboration, to explore whether AI models, and possibly doctors themselves, might have difficulty diagnosing skin diseases on darker skin tones — and whether these diagnostic capabilities could be improved.
“It seemed like a great opportunity to identify if there is a social problem and how we might solve it, as well as determine how best to integrate AI assistance into medical decision-making,” says Groh. “I’m very interested in how we can apply machine learning to real-world problems, especially to help experts improve their work. Medicine is a space where people make really important decisions, and if we could improve their decision-making, we could improve patient outcomes.
To assess the accuracy of doctors’ diagnosis, researchers compiled a series of 364 images from dermatology textbooks and other sources, depicting 46 skin diseases on many skin tones.
Most of these images depicted one of eight inflammatory skin diseases, including atopic dermatitis, Lyme disease and secondary syphilis, as well as a rare form of cancer called cutaneous T-cell lymphoma (CTCL), which may resemble an inflammatory skin condition. . Many of these diseases, including Lyme disease, can manifest differently on dark or light skin.
The research team recruited subjects for the study through Sermo, a social networking site for doctors. The total study group included 389 board-certified dermatologists, 116 dermatology residents, 459 general practitioners, and 154 other types of physicians.
Each of the study participants was shown 10 of the images and asked for their top three predictions about the disease each image might represent. They were also asked if they would refer the patient for a biopsy. Additionally, general practitioners were asked if they would refer the patient to a dermatologist.
“It’s not as comprehensive as in-person triage, where the doctor can examine the skin from different angles and control the lighting,” says Picard. “However, skin images are more scalable for online triage and they are easy to input into a machine learning algorithm, which can quickly estimate likely diagnoses.”
The researchers found that, unsurprisingly, dermatology specialists had higher accuracy rates: they correctly classified 38 percent of images, compared to 19 percent for general practitioners.
These two groups lost about four percentage points in accuracy when trying to diagnose skin conditions based on images of darker skin – a statistically significant drop. Dermatologists were also less likely to refer CTCL images of darker skin for biopsy, but more likely to refer them for biopsy for noncancerous skin conditions.
“This study clearly demonstrates that there is a disparity in the diagnosis of skin conditions in dark skin. This disparity is not surprising; however, I have not seen it demonstrated in the literature as robustly. Further research should be done to try to determine more precisely what the causative and mitigating factors for this disparity might be,” says Jenna Lester, associate professor of dermatology and director of the Skin of Color program at the University of California, San Francisco. , who did not participate in the study.
A helping hand from AI
After evaluating the doctors’ performance on their own, the researchers also gave them additional images to analyze using an AI algorithm developed by the researchers. The researchers trained this algorithm on about 30,000 images, asking it to classify images as one of eight diseases depicted by most images, plus a ninth “other” category.
This algorithm had an accuracy rate of around 47 percent. The researchers also created another version of the algorithm with an artificially inflated success rate of 84 percent, allowing them to assess whether the model’s accuracy would influence doctors’ likelihood of following its recommendations.
“This allows us to evaluate AI assistance with models that are currently the best we can do, and with AI assistance that could be more accurate, perhaps in five years, with better data and better models,” says Groh.
These two classifiers are equally accurate on light and dark skin. Researchers found that using either of these AI algorithms improved accuracy for both dermatologists (up to 60%) and general practitioners (up to 47%).
They also found that doctors were more likely to accept the higher-accuracy algorithm’s suggestions after providing a few correct answers, but rarely incorporated AI suggestions that were incorrect. This suggests that doctors are highly skilled at ruling out diseases and won’t accept AI suggestions for a disease they’ve already ruled out, Groh says.
“They are very good at not following AI advice when the AI is wrong and the doctors are right. It’s something that’s useful to know,” he says.
While dermatologists using AI showed a similar increase in accuracy when looking at images of light or dark skin, GPs showed greater improvement on images of lighter skin than on images of skin darker.
“This study allows us to see not only how AI assistance influences, but also how it influences all levels of expertise,” says Groh. “Perhaps what’s happening is that medical practitioners don’t have as much experience, so they don’t know whether to rule out a disease or not, because they don’t know the details of the disease as well. how different skin diseases can occur. look at different skin tones.
The researchers hope their findings will help encourage medical schools and textbooks to incorporate more training on darker-skinned patients. The findings could also help guide the deployment of AI assistance programs in dermatology, which many companies are currently developing.
The research was funded by the MIT Media Lab Consortium and the Harold Horowitz Student Research Fund.