Skip to main content

Do AI Chatbots Give Reliable Advice Regarding Cancer?

November 2023, Vol 13, No 11

Artificial intelligence (AI) chatbots showed mixed results when it came to providing treatment strategies and direct-to-patient cancer advice for a variety of malignancies, according to 2 studies recently published in JAMA Oncology.

The results from the first study, which assessed cancer treatment recommendations, showed that AI chatbots overall missed the mark on providing recommendations for breast, prostate, and lung cancers in accordance with national treatment guidelines.1

The findings from the second study, which evaluated responses to common cancer-related Google searches, were more positive, with the researchers reporting that the chatbots generally provided accurate information to consumers, although they noted the usefulness of the information may be limited by its complexity.2

Cancer Treatment Recommendations Study

In this study, Danielle S. Bitterman, MD, Assistant Professor, Radiation Oncology, Harvard Medical School, Boston, MA, and colleagues created 4 prompt templates for treatment recommendations for 26 different types of cancers (for a total of 104 prompts) and investigated the validity of ChatGPT-3.5 outputs concerning diagnostic recommendations for breast, prostate, and lung cancer versus the 2021 National Comprehensive Cancer Network (NCCN) Guidelines®. Several oncologists then assessed the level of concordance between the chatbot responses and these guidelines. The researchers noted that in accordance with the Common Rule, institutional review board approval was not necessary since human participants were not involved.

The chatbot provided at least 1 guideline-concordant treatment for 98% of prompts. All outputs with a recommendation included at least 1 NCCN-concordant treatment, but 34.3% of these outputs also recommended 1 or more nonconcordant treatments. In addition, approximately 12.5% of recommended treatments were “hallucinated,” that is, not part of any recommended treatment. Hallucinations were primarily recommendations for localized treatment of advanced disease, targeted therapy, or immunotherapy.

Based on their findings, the investigators recommended that clinicians advise patients that AI chatbots are not a reliable source of cancer treatment information.

“The chatbot did not purport to be a medical device, and need not be held to such standards. However, patients will likely use such technologies in their self-education, which may affect shared decision making and the patient-clinician relationship. Developers should have some responsibility to distribute technologies that do not cause harm, and patients and clinicians need to be aware of these technologies’ limitations,” Dr Bitterman and colleagues concluded.

Developers should have some responsibility to distribute technologies that do not cause harm, and patients and clinicians need to be aware of these technologies’ limitations.

—Danielle S. Bitterman, MD, and colleagues

Consumer Health Information Study

In this cross-sectional study, Abdo E. Kabarriti, MD, Assistant Professor, Urology, State University of New York Downstate Health Sciences University, Brooklyn, and colleagues analyzed the quality of responses to the top 5 most searched questions on skin, lung, breast, colorectal, and prostate cancer provided by 4 AI chatbots: ChatGPT-3.5, Perplexity (Perplexity.AI), Chatsonic (Writesonic), and Bing AI (Microsoft).

The findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level.

—Abdo E. Kabarriti, MD, and colleagues

Outcomes included the quality of consumer health information based on the DISCERN instrument (a scale of 1-5, with 1 representing low quality) and the understandability and actionability of this information based on domains of the Patient Education Materials Assessment Tool (PEMAT), with scores ranging from 0% to 100%, with higher scores indicating a higher level of understandability and actionability.

The quality of text responses generated by the 4 chatbots was good (median DISCERN score of 5, with no misinformation identified). Understandability was considered moderate (median PEMAT understandability score of 66.7%) but actionability was poor (median PEMAT actionability score of 20%). Three of the 4 chatbots cited reputable sources, such as the American Cancer Society, Mayo Clinic, and Centers for Disease Control and Prevention, which the researchers said was “reassuring.”

However, they also noted that the usefulness of the information was “limited” because responses were often written at a college reading level. Another limitation was that the AI chatbots provided concise answers with no visual aids, which may not be sufficient to explain more complex ideas to consumers.

“The findings of this cross-sectional study suggest that AI chatbots generally produce accurate information for the top cancer-related search queries, but the responses are not readily actionable and are written at a college reading level. These limitations suggest that AI chatbots should be used supplementarily and not as a primary source for medical information,” Dr Kabarriti and colleagues concluded.

References

  1. Chen S, Kann BH, Foote MB, et al. Use of artificial intelligence chatbots for cancer treatment information. JAMA Oncol. 2023;9:1459-1462.
  2. Pan A, Musheyev D, Bockelman D, et al. Assessment of artificial intelligence chatbot responses to top searched queries about cancer. JAMA Oncol. 2023;9:1437-1440.

Related Items