LLM-based generative chat tools, such as ChatGPT and Google’s MedPaLM, show great promise in the medical field. However, the unregulated use of AI chatbots poses inherent risks. A recent article delves into the urgent international issue of regulating Large Language Models (LLMs) in general and specifically within healthcare. Professor Stephen Gilbert, an Medical Device Regulatory Science expert, emphasizes the need to develop new frameworks that prioritize patient safety when utilizing these powerful chatbots.
LLMs are neural network language models known for their remarkable conversational skills. They can generate human-like responses and engage in interactive conversations. However, one of the key concerns is that they often provide highly convincing yet incorrect or inappropriate information. Currently, there is no reliable way to determine the quality, evidence level, or consistency of the clinical information and supporting evidence provided by these chatbots. Consequently, relying on them for medical advice can be unsafe, warranting the establishment of new frameworks to ensure patient well-being.
Integration of chatbots with search engines
Research shows that many individuals search for symptoms online before seeking medical advice, often relying on search engine results. The upcoming integration of LLM-based chatbots into search engines may further increase users’ confidence in the responses provided by these conversational tools. However, studies have demonstrated that LLMs can produce profoundly dangerous information when confronted with medical queries. This underscores the pressing need for regulatory control over integrating LLMs with search engines.
An inherent issue with LLMs lies in their lack of a medical “ground truth” model, making them intrinsically risky. Instances have already occurred where chat-interfaced LLMs provided harmful medical responses or were unethically used in experiments on patients without proper consent. Almost every medical use case involving LLMs requires regulatory oversight in the EU and US. In the US, their lack of explainability disqualifies them from being classified as “non-devices.” No LLMs exhibit explainability, low bias, predictability, correctness, and verifiable outputs. Consequently, they are not exempted from current or future governance approaches.
Potential applications under existing frameworks
The paper’s authors outline the limited scenarios in which LLMs could be applied under current regulatory frameworks. They also describe how developers can strive to create LLM-based tools that could obtain approval as medical devices. However, it is important to note that current LLM-based chatbots do not meet the essential principles for AI in healthcare, such as bias control, explainability, oversight systems, validation, and transparency. To earn their place in the medical field, chatbots must be designed for improved accuracy, demonstrating safety and clinical efficacy approved by regulators.
Regulating AI chatbots, particularly LLM-based ones, as medical devices is vital to ensure patient safety in healthcare. The risks associated with their unregulated use highlight the necessity of developing frameworks that address accuracy, safety, and clinical efficacy concerns. To establish trust in LLM-based tools, they must adhere to key principles for AI in healthcare. Collaboration between developers and regulators is crucial in establishing guidelines that promote the responsible and ethical implementation of LLM-based chatbots within the medical realm.