Artificial intelligence applications in colonoscopy
-
01/24/2023
Considerable advances in artificial intelligence (AI) and machine-learning (ML) methodologies have led to the emergence of promising tools in the field of gastrointestinal endoscopy. Computer vision is an application of AI/ML that has been successfully applied for the computer-aided detection (CADe) and computer-aided diagnosis (CADx) of colon polyps and numerous other conditions encountered during GI endoscopy. Outside of computer vision, a wide variety of other AI applications have been applied to gastroenterology, ranging from natural language processing (NLP) to optimize clinical documentation and endoscopy quality reporting to ML techniques that predict disease severity/treatment response and augment clinical decision-making.
In the United States, colonoscopy is the standard for colon cancer screening and prevention; however, precancerous polyps can be missed for various reasons, ranging from subtle surface appearance of the polyp or location behind a colonic fold to operator-dependent reasons such as inadequate mucosal inspection. Though clinical practice guidelines have set adenoma detection rate (ADR) thresholds at 20% for women and 30% for men, studies have shown a 4- to 10-fold variation in ADR among physicians in clinical practice settings,1 with an estimated adenoma miss rate (AMR) of 25% and a false-negative colonoscopy rate of 12%.2 Variability in adenoma detection affects the risk of interval colorectal cancer post colonoscopy.3,4

AI provides an opportunity for mitigating this risk. Advances in deep learning and computer vision have led to the development of CADe systems that automatically detect polyps in real time during colonoscopy, resulting in reduced adenoma miss rates (Table 1). In addition to polyp detection, deep-learning technologies are also being used in CADx systems for polyp diagnosis and characterization of malignancy risk. This could aid therapeutic decision-making: Unnecessary resection or histopathologic analysis could be obviated for benign hyperplastic polyps. On the other end of the polyp spectrum, an AI tool that could predict the presence or absence of submucosal invasion could be a powerful tool when evaluating early colon cancers for consideration of endoscopic submucosal dissection vs. surgery. Examples of CADe polyp detection and CADx polyp characterization are shown in Figure 1.

Other potential computer vision applications that may improve colonoscopy quality include tools that help measure adequacy of mucosal exposure, segmental inspection time, and a variety of other parameters associated with polyp detection performance. These are promising areas for future research. Beyond improving colonoscopy technique, natural language processing tools already are being used to optimize clinical documentation as well as extract information from colonoscopy and pathology reports that can facilitate reporting of colonoscopy quality metrics such as ADR, cecal intubation rate, withdrawal time, and bowel preparation adequacy. AI-powered analytics may help unlock large-scale reporting of colonoscopy quality metrics on a health-systems level5 or population-level,6 helping to ensure optimal performance and identifying avenues for colonoscopy quality improvement.
The majority of AI research in colonoscopy has focused on CADe for colon polyp detection and CADx for polyp diagnosis. Over the last few years, several randomized clinical trials – two in the United States – have shown that CADe significantly improves adenoma detection and reduces adenoma miss rates in comparison to standard colonoscopy. The existing data are summarized in Table 1, focusing on the two U.S. studies and an international meta-analysis.
In comparison, the data landscape for CADx is nascent and currently limited to several retrospective studies dating back to 2009 and a few prospective studies that have shown promising results.10,11 There is an expectation that integrated CADx also may support the adoption of “resect and discard” or “diagnose and leave” strategies for low-risk polyps. About two-thirds of polyps identified on average-risk screening colonoscopies are diminutive polyps (less than 5 mm in size), which rarely have advanced histologic features (about 0.5%) and are sometimes non-neoplastic (30%). Malignancy risk is even lower in the distal colon.12 As routine histopathologic assessment of such polyps is mostly of limited clinical utility and comes with added pathology costs, CADx technologies may offer a more cost-effective approach where polyps that are characterized in real-time as low-risk adenomas or non-neoplastic are “resected and discarded” or “left in” respectively. In 2011, prior to the development of current AI tools, the American Society for Gastrointestinal Endoscopy set performance thresholds for technologies supporting real-time endoscopic assessment of the histology of diminutive colorectal polyps. The ASGE recommended 90% histopathologic concordance for “resect and discard” tools and 90% negative predictive value for adenomatous histology for “diagnose and leave,” tools.13 Narrow-band imaging (NBI), for example, has been shown to meet these benchmarks14,15 with a modeling study suggesting that implementing “resect and discard” strategies with such tools could result in annual savings of $33 million without adversely affecting efficacy, although practical adoption has been limited.16 More recent work has directly explored the feasibility of leveraging CADx to support “leave-in-situ” and “resect-and-discard” strategies.17
Similarly, while CADe use in colonoscopy is associated with additional up-front costs, a modeling study suggests that its associated gains in ADR (as detailed in Table 1) make it a cost-saving strategy for colorectal cancer prevention in the long term.18 There is still uncertainty on whether the incremental CADe-associated gains in adenoma detection will necessarily translate to significant reductions in interval colorectal cancer risk, particularly for endoscopists who are already high-performing polyp detectors. A recent study suggests that, although higher ADRs were associated with lower rates of interval colorectal cancer, the gains in interval colorectal cancer risk reduction appeared to level off with ADRs above 35%-40% (this finding may be limited by statistical power).19 Further, most of the data from CADe trials suggest that gains in adenoma detection are not driven by increased detection of advanced lesions with high malignancy risk but by small polyps with long latency periods of about 5-10 years, which may not significantly alter interval cancer risk. It remains to be determined whether adoption of CADe will have an impact on hard outcomes, most importantly interval colorectal cancer risk, or merely result in increased resource utilization without moving the needle on colorectal cancer prevention. To answer this question, the OperA study – a large-scale randomized clinical trial of 200,000 patients across 18 centers from 13 countries – was launched in 2022. It will investigate the effect of colonoscopy with CADe on a number of critical measures, including long-term interval colon cancer risk.20
Despite commercial availability of regulatory-approved CADe systems and data supporting use for adenoma detection in colonoscopy, mainstream adoption in clinical practice has been sluggish. Physician survey studies have shown that, although there is considerable interest in integrating CADe into clinical practice, there are concerns about access, cost and reimbursement, integration into clinical work-flow, increased procedural times, over-reliance on AI, and algorithmic bias leading to errors.21,22 In addition, without mandatory requirements for ADR reporting or clinical practice guideline recommendations for CADe use, these systems may not be perceived as valuable or ready for prime time even though the evidence suggests otherwise.23,24 For CADe systems to see widespread adoption in clinical practice, it is important that future research studies rigorously investigate and characterize these potential barriers to better inform strategies to address AI hesitancy and implementation challenges. Such efforts can provide an integration framework for future AI applications in gastroenterology beyond colonoscopy, such as CADe of esophageal and gastric premalignant lesions in upper endoscopy, CADx for pancreatic cysts and liver lesions on imaging, NLP tools to optimizing efficient clinical documentation and reporting, and many others.
Dr. Uche-Anya is in the division of gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston. Dr. Berzin is with the Center for Advanced Endoscopy, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston. Dr. Berzin is a consultant for Wision AI, Medtronic, Magentiq Eye, RSIP Vision, and Docbot.
Corresponding Author: Eugenia Uche-Anya eucheanya@mgh.harvard.edu Twitter: @UcheAnyaMD @tberzin
Summary content
7 Key Takeaways
-
1
Developed a paper-based colorimetric sensor array for chemical threat detection.
-
2
Can detect 12 chemical agents, including industrial toxins.
-
3
Production cost is under 20 cents per chip.
-
4
Utilizes dye-loaded silica particles on self-adhesive paper.
-
5
Provides rapid, simultaneous identification through image analysis.
-
6
Inspired by the mammalian olfactory system for pattern recognition.
-
7
Future developments include a machine learning-enabled reader device.
The guidelines emphasize four-hour gastric emptying studies over two-hour testing. How do you see this affecting diagnostic workflows in practice?
Dr. Staller: Moving to a four-hour solid-meal scintigraphy will actually simplify decision-making. The two-hour reads miss a meaningful proportion of delayed emptying; standardizing on four hours reduces false negatives and the “maybe gastroparesis” purgatory that leads to repeat testing. Practically, it means closer coordination with nuclear medicine (longer slots, consistent standardized meal), updating order sets to default to a four-hour protocol, and educating front-line teams so patients arrive appropriately prepped. The payoff is fewer equivocal studies and more confident treatment plans.
Metoclopramide and erythromycin are the only agents conditionally recommended for initial therapy. How does this align with what is being currently prescribed?
Dr. Staller: This largely mirrors real-world practice. Metoclopramide remains the only FDA-approved prokinetic for gastroparesis, and short “pulsed” erythromycin courses are familiar to many of us—recognizing tachyphylaxis limits durability. Our recommendation is “conditional” because the underlying evidence is modest and patient responses are heterogeneous, but it formalizes what many clinicians already do: start with metoclopramide (lowest effective dose, limited duration, counsel on neurologic adverse effects) and reserve erythromycin for targeted use (exacerbations, bridging).
Several agents, including domperidone and prucalopride, received recommendations against first-line use. How will that influence discussions with patients who ask about these therapies?
Dr. Staller: Two points I share with patients: evidence and access/safety. For domperidone, the data quality is mixed, and US access is through an FDA IND mechanism; you’re committing patients to EKG monitoring and a non-trivial administrative lift. For prucalopride, the gastroparesis-specific evidence isn’t strong enough yet to justify first-line use. So, our stance is not “never,” it’s just “not first.” If someone fails or cannot tolerate initial therapy, we can revisit these options through shared decision-making, setting expectations about benefit, monitoring, and off-label use. The guideline language helps clinicians have a transparent, evidence-based conversation at the first visit.
The guidelines suggest reserving procedures like G-POEM and gastric electrical stimulation for refractory cases. In your practice, how do you decide when a patient is “refractory” to medical therapy?
Dr. Staller: I define “refractory” with three anchors.
1. Adequate trials of foundational care: dietary optimization and glycemic control; an antiemetic; and at least one prokinetic at appropriate dose/duration (with intolerance documented if stopped early).
2. Persistent, function-limiting symptoms: ongoing nausea/vomiting, weight loss, dehydration, ER visits/hospitalizations, or malnutrition despite the above—ideally tracked with a validated instrument (e.g., GCSI) plus nutritional metrics.
3. Objective correlation: delayed emptying on a standardized 4-hour solid-meal study that aligns with the clinical picture (and medications that slow emptying addressed).
At that point, referral to a center with procedural expertise for G-POEM or consideration of gastric electrical stimulation becomes appropriate, with multidisciplinary evaluation (GI, nutrition, psychology, and, when needed, surgery).
What role do you see dietary modification and glycemic control playing alongside pharmacologic therapy in light of these recommendations?
Dr. Staller: They’re the bedrock. A small-particle, lower-fat, calorie-dense diet—often leaning on nutrient-rich liquids—can meaningfully reduce symptom burden. Partnering with dietitians early pays dividends. For diabetes, tighter glycemic control can improve gastric emptying and symptoms; I explicitly review medications that can slow emptying (e.g., opioids; consider timing/necessity of GLP-1 receptor agonists) and encourage continuous glucose monitor-informed adjustments. Pharmacotherapy sits on top of those pillars; without them, medications will likely underperform.
The guideline notes “considerable unmet need” in gastroparesis treatment. Where do you think future therapies or research are most urgently needed?
Dr. Staller: I see three major areas.
1. Truly durable prokinetics: agents that improve emptying and symptoms over months, with better safety than legacy options (e.g., next-gen motilin/ghrelin agonists, better-studied 5-HT4 strategies).
2. Endotyping and biomarkers: we need to stop treating all gastroparesis as one disease. Clinical, physiologic, and microbiome/omic signatures that predict who benefits from which therapy (drug vs G-POEM vs GES) would transform care.
3. Patient-centered trials: larger, longer RCTs that prioritize validated symptom and quality-of-life outcomes, include nutritional endpoints, and reflect real-world medication confounders.
Our guideline intentionally highlights these gaps to hopefully catalyze better trials and smarter referral pathways.
Dr. Staller is with the Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston.

