New research proposes a system for determining the relative accuracy of predictive AI in a hypothetical medical setting, and when the system should defer to a human clinician.
Artificial intelligence (AI) has great potential to improve the way people work across many industries. But to integrate AI tools into the workplace safely and responsibly, we need to develop more robust methods to understand when they can be most useful.
So when is AI more accurate, and when is humans? This question is particularly important in healthcare, where predictive AI is increasingly used in high-stakes tasks to support clinicians.
Today in Natural medicinewe published our joint paper with Google Research, which proposes CoDoC (Complementarity-driven Deferral-to-Clinical Workflow), an AI system that learns when to rely on predictive AI tools or rely on a clinician for the most accurate interpretation of medical images.
CoDoC explores how we could leverage human-AI collaboration in hypothetical medical settings to achieve the best outcomes. In one example scenario, CoDoC reduced the number of false positives by 25% for a large anonymized UK mammography dataset, compared to commonly used clinical workflows, without missing any true positives.
This work is the result of collaboration with several health care organizations, including the Stop TB partnership of the United Nations Office for Project Services. To help researchers leverage our work to improve the transparency and security of AI models for the real world, we have also released open source versions. CoDoC code on GitHub.
CoDoC: complementary tool for human-AI collaboration
Building more reliable AI models often requires rethinking the complex inner workings of predictive AI models. However, for many healthcare providers, rethinking a predictive AI model is simply not possible. CoDoC can potentially help improve predictive AI tools for its users without requiring them to modify the underlying AI tool itself.
When developing CoDoC, we had three criteria:
- Non-machine learning experts, such as healthcare providers, should be able to deploy the system and run it on a single computer.
- Training would require a relatively small amount of data – typically only a few hundred examples.
- The system could be compatible with any proprietary AI model and would not need access to the inner workings of the model or the data it was trained on.
Determine when predictive AI or a clinician is more accurate
With CoDoC, we deliver a simple, usable AI system to improve reliability by helping predictive AI systems “know when they don’t know.” We looked at scenarios where a clinician might have access to an AI tool designed to help interpret an image, such as reviewing a chest X-ray to know if a tuberculosis test is needed.
For any theoretical clinical setting, the CoDoC system requires only three entries for each case in the training dataset.
- Predictive AI generates a confidence score between 0 (certain that no disease is present) and 1 (certain that a disease is present).
- The clinician’s interpretation of the medical image.
- The fundamental truth about the presence or absence of a disease, established for example by a biopsy or other clinical follow-up.
Note: CoDoC does not require access to medical images.
CoDoC learns to establish the relative accuracy of the predictive AI model versus clinicians’ interpretation, and how this relationship fluctuates with predictive AI confidence scores.
Once trained, CoDoC could be inserted into a hypothetical future clinical workflow involving both an AI and a clinician. When a new patient image is evaluated by the predictive AI model, its associated confidence score is fed into the system. Next, CoDoC evaluates whether accepting the AI’s decision or deferring to a clinician will ultimately result in the most accurate interpretation.
Increased precision and efficiency
Our comprehensive testing of CoDoC with multiple real-world data sets – including only historical and anonymized data – has shown that combining the best of human expertise and predictive AI achieves greater accuracy than ‘with one or the other alone.
In addition to reducing false positives for a mammogram dataset by 25%, in hypothetical simulations where an AI was allowed to act autonomously on certain occasions, CoDoC was able to reduce by half the number of cases that needed to be read by a clinician. third party. We also showed how CoDoC could hypothetically improve the triage of chest x-rays for subsequent TB testing.
Responsibly developing AI for healthcare
Although this work is theoretical, it shows the potential for adaptation of our AI system: CoDoC was able to improve the performance of medical imaging interpretation with varied demographic populations, clinical contexts, and imaging equipment. medical imaging used and types of diseases.
CoDoC is a promising example of how we can harness the benefits of AI in combination with human strengths and expertise. We work with external partners to rigorously evaluate our research and the potential benefits of the system. To safely bring technology like CoDoC into real-world medical settings, healthcare providers and manufacturers will also need to understand how clinicians interact differently with AI and validate systems with medical AI tools and parameters specific.
Learn more about CoDoC: