In a recent study published on the medRxiv preprint server, researchers conducted a comprehensive review of randomized controlled trials (RCTs) involving artificial intelligence (AI) algorithms from 2018 to 2023. This review aimed to shed light on the clinical relevance of AI in healthcare and identify areas that require further exploration. While the study found promising results, it also raised important questions about the implementation of AI in clinical practice.*
The FDA’s approval of AI-enabled medical devices
The Food and Drug Administration (FDA) has given its stamp of approval to approximately 300 AI-enabled medical devices, citing research studies that showcased their superior performance compared to clinicians. However, a notable concern emerged when some widely used AI models, such as the sepsis model, were found to perform worse than initially reported by their developers. This discrepancy led to incorrect alerts, raising doubts about the real-world effectiveness of AI in clinical settings.
The scoping review uncovers trends in AI RCTs
In their scoping review, researchers aimed to provide insights into AI’s impact on clinical practice by analyzing RCTs published between 2018 and 2023. They used specific keywords related to AI, clinicians, and clinical trials to identify relevant studies published in English on PubMed and the International Clinical Trials Registry Platform (ICTRP).
The criteria for inclusion in the study were as follows:
1. Use of a non-linear computational model based on AI as an intervention.
2. Integration of AI-based intervention into clinical practice, resulting in an impact on patient health.
3. Publication as a full-text peer-reviewed article.
Two independent investigators initially screened the studies, followed by full-text screening, with discrepancies resolved through discussion by a third reviewer. Information regarding the study site, clinical task, results, and the type of AI used was collected from eligible RCTs. The studies were categorized by their primary endpoints, such as care management, medical specialty, and AI-used data modality.
Notable trends in AI RCTs
The study analyzed a total of 84 RCTs, revealing several notable trends in the development of AI in clinical practice:
Medical specialties
– Gastroenterology-related RCTs were the most prevalent (35/84), followed by radiology, surgery, and cardiology.
– Gastroenterology-related RCTs were primarily conducted by four research groups from Wuhan University, Wision AI, Medtronic, and Fujifilm. These studies were notable for their uniformity and testing of video-based machine learning (ML) algorithms with clinician involvement.
Geographic distribution
– The United States led in the number of RCTs conducted, followed by China. This suggests that most RCTs were single-site studies.
– China predominantly conducted gastroenterology-related RCTs, whereas RCTs in the United States covered multiple medical specialties.
– Multi-center RCTs were primarily conducted in European nations, but single-site RCTs evaluating an average of 359 patients were predominant in the final study set.
RCT outcomes
– Compared to historical reviews of RCTs for AI in healthcare, most RCTs evaluating AI-based medical devices in clinical practice yielded more positive outcomes for primary endpoints (69/84). This high success rate lends credibility to clinical AI.
– However, it’s essential to acknowledge that the early stage of the field and potential publication bias may have influenced these observations.
The need for clinically meaningful endpoints
– While RCTs assessing diagnostic accuracy of AI algorithms showed promising results, they may not accurately represent improved patient outcomes.
– The study emphasizes the importance of incorporating clinically meaningful endpoints such as patient symptoms, survival, and treatment needs in future RCTs evaluating AI algorithms in healthcare.
The study reveals a growing interest in AI applications across diverse medical specialties and locations. The FDA’s approval of numerous AI-enabled medical devices underscores the potential of AI in improving healthcare. However, the study also highlights critical concerns that must be addressed:
1. Performance discrepancies: The gap between reported AI performance and real-world results, as seen in the sepsis model, necessitates further investigation and refinement.
2. Geographic diversity: To ensure the validity of AI systems across diverse populations and healthcare systems, there is a clear need for more multi-center international trials.
3. Clinically meaningful endpoints: While diagnostic accuracy is essential, future RCTs should prioritize endpoints that directly impact patient outcomes, ultimately demonstrating AI’s true value in clinical practice.
As AI continues to evolve and find its place in healthcare, ongoing research and collaboration among stakeholders are crucial to maximize its benefits while addressing its limitations. The study serves as a valuable roadmap for the future of AI in clinical practice, emphasizing the need for a more comprehensive and patient-centered approach.