Circulation 122,210 • Volume 35, No. 1 • February 2020   Issue PDF

PRO-CON DEBATE – CON: Artificial Intelligence is Not a Magic Pill

Piyush Mathur, MD, FCCM

Related Article:

PRO: Artificial Intelligence (AI) in Health Care


This Pro-Con Debate took place at the 2019 Stoelting Conference entitled “Patient Deterioration: Early Recognition, Rapid Intervention, and the end of Failure to Rescue.” The two following authors have expertise in the field of adopting artifical intelligence for managing patients who are deteriorating in the hospital setting.

Artificial Intelligence (AI) is supposed to hold the promise of curing many problems facing health care such as predicting morbidity and mortality and outperforming physicians at diagnosis. In reality, despite increasing research, there are a limited number of clinically validated AI algorithms. Even as the number of U.S. Food and Drug Administration-approved AI applications grows, the implementation and widespread use of these applications has been challenging. Computer scientist Rodney Brooks described some of the challenges with AI predictions. These include overestimating or underestimating solutions, imagining magical algorithms, the scale of deployment, and performance limitations.1,2

AI performance limitations are especially important in diagnostic AI solutions. Many researchers using artificial neural networks have claimed to improve diagnosis and outperform clinicians, as in diagnosis of diseases visualized on chest X-rays.3 Often, these self-limited, narrow spectrum algorithms can detect lesions such as atelectasis or infiltrates on chest X-rays. Despite claims of high accuracy however, these applications have been hard to replicate and generalize.4 In other approaches to machine learning, the computer algorithm learns from clinician-labeled data. In many publicly available chest X-ray data sets underpinning these algorithms, lesions are labelled by radiologists as infiltrates, mass, atelectasis, etc. These clinician assessments are considered the “gold standard,” but significant inter-rater differences have been noted,5 raising the specter of mislabeled datasets. Algorithms created from such mislabeled datasets are likely to have significant errors in their results which can confound clinician decision-making.

AI-based prediction of disease is similarly problematic. In the research done on prediction of acute kidney injury by Tomasev et al., prediction bias was introduced through the dataset itself. Their U.S. Veteran Affairs dataset contained only 6.4% female patients; model performance in these patients was lower than the rest.6 Bias continues to be a challenge even in administrative datasets and solutions developed for use by health care executives or insurance companies. As demonstrated by Obermeyer et al., these biases can be introduced at the level of algorithm development, but can also be based on the dataset used or the way the algorithm is implemented.7 These biased algorithms can lead to delivery of improper unsafe treatment to our patients.

Indeed, poor predictive values continue to limit the adoption of well-researched AI algorithms. Results based on the “area under the curve”—a statistical reflection of “model fit”—have been extensively exploited to report accuracy of these algorithms. However, multiple other parameters should be considered, including sensitivity and positive predictive value. Without good predictive values and replicable results, AI algorithms are unlikely to be adopted by clinicians.8

Scalability and generalizability of AI algorithms is another big challenge in health care. While electronic health records are the primary means to deploy many of these algorithms, poor interfaces, limited support for IT teams, and lack of integrated solutions still limit the ease of adoption.

Marketing and hype created by some organizations has also had a negative impact and resulted in loss of credibility of AI amongst many clinicians. Some of the well-researched breakthroughs have been hyped enormously to leverage the current market value associated with AI. In a survey of European startups using AI by the London venture capital firm Marsh & McLennan Companies, Inc. (MMC), 40% were not actually using AI as a part of their product.9

AI does hold the promise of delivering potentially safer solutions for health care using the ever-increasing volume of data in an efficient and reproducible manner. But realizing this potential requires clinician leadership and rigorous clinical validation while developing and deploying AI algorithms (Table 1).

Table 1: Solutions for Effective Deployment of AI in Health Care

Patient- and care-provider-centric—first do no harm
Clinician leadership
Rigorous model development and testing
Explainable or Interpretable solutions—avoidance of black box
Clinical validation for generalizability and scalability
Cost-effective solutions

We are still in the early phases of research and development of AI algorithms for health care. Clearly, the growth in AI has been exponential and the pace is likely to continue in the near future. We need to be prepared to dedicate clinical, information technology, and financial resources to see effective utilization of these remarkable algorithms. Clinicians, especially radiologists and oncologists, are already leading the development of many AI algorithms to avoid ill-prepared solutions creeping into their work environment. Anesthesia professionals and perioperative clinicians who have been early adopters of technology and live in a data-rich environment also need to lead research, development, and deployment of sustainable AI algorithms to provide safer care to our patients.

 

Dr. Mathur is staff anesthesiologist/intensivist in the Department of General Anesthesiology and the quality improvement officer, Anesthesiology Institute, Cleveland Clinic, Cleveland, Ohio.


The author has no conflicts of interest to disclose.


References

  1. Brooks R. https://www.technologyreview.com/s/609048/the-seven-deadly-sins-of-ai-predictions/. MIT technology review. 2017. Accessed December 9, 2019.
  2. Panetta K. https://www.gartner.com/smarterwithgartner/5-trends-appear-on-the-gartner-hype-cycle-for-emerging-technologies-2019/. Accessed August 29, 2019.
  3. Rajpurkar P, Irvin J, Ball RL, et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018;15:e1002686.
  4. Zech JR, Badgeley MA, Liu M, et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 2018;15:e1002683.
  5. Oakden-Rayner L. Exploring large-scale public medical image datasets. Acad Radiol. 2019.
  6. Tomasev N, Glorot X, Rae JW, et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature. 2019;572:116–119.
  7. Obermeyer Z, Powers B, Vogeli C, et al. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366:447–453.
  8. Ginestra JC, Giannini HM, Schweickert WD, et al. Clinician perception of a machine learning-based early warning system designed to predict severe sepsis and septic shock. Crit Care Med. 2019;47:1477–1484.
  9. Olson P. https://www.forbes.com/sites/parmyolson/2019/03/04/nearly-half-of-all-ai-startups-are-cashing-in-on-hype/#454f99e7d022. Forbes. Accessed March 4, 2019.