Jindal School Study Reveals Public Favors Chatbot Answers Over Those of Physicians

by Jimmie Markham - May 30th, 2025 - Faculty/Research

A futuristic illustration showing a humanoid robot doctor with a visible brain and internal circuits holding a digital tablet, facing a human doctor. Both are wearing white lab coats and stethoscopes, symbolizing the integration of artificial intelligence and human expertise in medicine.

A study by a team of researchers from the Naveen Jindal School of Management reveals that when laypeople compare answers from licensed physicians to those from AI chatbots like ChatGPT, they consistently prefer the AI — even when experts rate the chatbot’s answers as lower in clinical quality.

The working paper — “What People Think of Machines as Doctors: Unveiling the Value of Gen-AI for e-Health” — was written by Dr. Mehmet Ayvaci, an associate professor in the Jindal School’s Information Systems Area; Dr. Alejandro Zentner, an associate professor in the Jindal School’s Finance and Managerial Economics Area; and Dr. Dicle Yagmur Ozdemir, PhD’23, of Rotterdam School of Management at Erasmus University Rotterdam.

The study comes at a time when AI tools are being rapidly integrated into public-facing platforms — advancements moving faster than the traditional academic publishing process, which can take years from submission to print. This dynamic highlights a growing tension in how the scientific community evaluates and disseminates insights on emerging technologies.

In their experiments, the team found that study participants consistently rated ChatGPT-generated answers as more helpful and convincing than those written by licensed physicians, even when those responses were rated by experts as lower in clinical quality.

Mehmet Ayvaci

“That might sound surprising, but it makes sense when you think about how people without medical training judge information,” Ayvaci said. “They tend to focus on things they can evaluate like how clear or respectful the response is rather than how medically accurate it is. ChatGPT usually gives longer, more detailed, and polite responses, which makes it feel more helpful. So even if the content is not always better, the style makes people trust it more.”

The study observes that when people read something outside their area of expertise, such as a medical answer, they often look for clues to help judge it. One of those clues is length. A longer response can appear to reflect more effort or expertise, even if that is not actually the case.

Alejandro Zentner

“In behavioral economics, this is known as attribute substitution,” Zentner said. “Instead of evaluating the true medical accuracy, which is difficult for non-experts, people substitute an easier feature, such as how long or detailed the response is. So length ends up shaping trust, not because it guarantees accuracy, but because it feels like quality.”

Relying too much on how a response is perceived — such as how polite, detailed, or well-written it is — can sometimes be misleading, the researchers note.

“While many AI responses are accurate, there are cases where the advice sounds convincing but does not align with clinical standards,” Ayvaci said. “If a patient acts on such information, there is a risk of being misinformed, which could potentially lead to harm. In our study, people often preferred AI responses regardless of whether they were rated lower in quality by medical experts. This gap does not always occur and may not always result in negative outcomes, but it does raise concerns about the potential for style to overshadow substance in health communication.”

To ensure that the experiment reflected real-life patient questions and interactions, the team used actual patient questions that had been posted publicly on Reddit’s r/AskDocs forum, where people often turn for informal medical advice.

“These are real concerns from real people, not hypothetical scenarios,” Zentner said. “Each question in our experiment had two responses — one from a verified physician who answered the post online, and another generated by ChatGPT using the same prompt. By comparing how non-experts reacted to these answers, we were able to capture a realistic picture of how people evaluate health advice in online settings. This setup helped us study behavior that closely mirrors how patients engage with medical content on the internet.”

The study finds that disclosing the source of the response (ChatGPT versus human) changes people’s trust levels. Importantly, participants were not shown explicit medical errors or inaccuracies to provoke a change in trust, Ayvaci said.

“They simply learned whether a response came from ChatGPT or a physician,” he said. “Even without seeing mistakes, just knowing the response was machine-generated lowered their trust, especially when the AI response was lower in quality. This reflects a form of algorithm aversion, where people are more skeptical of advice from machines, sometimes due to general discomfort or low confidence in technology. Interestingly, this aversion helped in certain cases by making people more cautious when the AI response did not meet expert standards. So while source disclosure is a small change, it can meaningfully influence how people judge the information they receive.”

Patients’ familiarity with the healthcare system does influence their evaluation of AI as compared to human answers. The researchers found that people who are more familiar with the system, those who visit doctors regularly or manage ongoing health needs, tend to evaluate answers more carefully.

“They pay attention to things like clarity, helpfulness, respectfulness and whether the response addresses the question in detail,” Zentner said. “In other words, they focus on communication features that are recognized as important by key stakeholders in healthcare. In contrast, people with less experience are more likely to rely on surface-level cues, such as how long the response is. For them, the perceived effort behind the answer has a direct impact on how trustworthy it feels. This difference helps explain why the same AI response can be judged very differently by different people.”

The study also reports that education seems to shape how people respond to learning that a response comes from AI.

“In our study, participants with higher education levels were less likely to show algorithm aversion,” Ayvaci said. “They were more open to trusting AI-generated responses, even after knowing the source. In contrast, those with lower levels of education were more likely to become skeptical once they found out the answer came from a machine. This pattern is consistent with prior research showing that individuals with more education tend to have more positive attitudes toward technology and are more confident in navigating digital tools. Education may influence both comfort with AI and the ability to judge information critically.”

Based on their findings, the team proposes improvements for designing AI that provides accurate and trustworthy health information.

“One clear takeaway from our study is that style matters, sometimes too much,” Zentner said. “One goal for improving AI in healthcare should be to align perceived quality with actual clinical quality. That could mean making responses more readable and respectful without relying on extra length just to sound convincing.”

The findings also suggest the need for transparency.

“Clearly indicating that an AI-generated answer has been reviewed or approved by a medical professional may help build trust without misleading people,” Ayvaci said. “Tailoring responses to different user types, such as first-time patients versus more experienced ones, also could help ensure that the advice is both understandable and appropriate for the situation.”

As for how healthcare providers and policymakers can work together to mitigate the risks of patients misinterpreting AI-generated advice, the researchers have more advice.

“Whether we like it or not, there is a growing trend of patients turning to digital channels for health information and advice,” said Ozdemir, who graduated from The University of Texas at Dallas in 2023 with a PhD in Management Science with a concentration in Information Systems. “A recent study found that more than half of healthcare-related search queries now receive an AI-generated response. We are also seeing AI being used on platforms that offer basic diagnostic services. In this environment, it becomes critical to understand how people perceive and trust these tools.”

Ayvaci said healthcare providers can play a role by integrating AI tools in ways that include clinical oversight, such as reviewing AI-generated content before it reaches patients. Policymakers can help by setting standards for transparency, accuracy, and monitoring.

“They might require that AI-generated responses be clearly labeled or reviewed for clinical validity,” he said. “Working together, providers and policymakers can help ensure that patients benefit from these technologies without being misled by them.”

The study indicates that future research should focus on how patients interact with AI in real-world healthcare settings, not just in theory.

“As AI tools become more integrated into patient portals, chatbots, virtual care platforms, and even search engines, it is important to observe how people use them when making actual health decisions,” Zentner said. “We also need to understand how different populations across various socioeconomic strata respond to AI in high-stakes situations.

Another priority, he said, is testing which design choices help promote safe use, such as adding human review, using clear disclaimers and privacy choices, or tailoring responses based on patient needs.

“As AI becomes a more visible part of healthcare, these insights will be essential for ensuring it supports patient safety and quality of care,” Ayvaci said.