Next time you're considering consulting Dr ChatGPT, perhaps think again.
Despite now being able to ace most medical licensing exams, artificial intelligence chatbots do not give humans better health advice than they can find using more traditional methods, according to a study published on Monday.
"Despite all the hype, Artificial Intelligence (AI) just isn't ready to take on the role of the physician," study co-author Rebecca Payne from Oxford University said.
"Patients need to be aware that asking a large language model about their symptoms can be dangerous, giving wrong diagnoses and failing to recognise when urgent help is needed," she added in a statement.
The British-led team of researchers wanted to find out how successful humans are when they use chatbots to identify their health problems and whether they require seeing a doctor or going to hospital.
The team presented nearly 1,300 UK-based participants with 10 different scenarios, such as a headache after a night out drinking, a new mother feeling exhausted or what having gallstones feels like.
Then the researchers randomly assigned the participants one of three chatbots: OpenAI's GPT-4o, Meta's Llama 3 or Command R+. There was also a control group that used internet search engines.
People using the AI chatbots were only able to identify their health problem around a third of the time, while only around 45 per cent figured out the right course of action.
This was no better than the control group, according to the study, published in the Nature Medicine journal.
The researchers pointed out the disparity between these disappointing results and how AI chatbots score extremely highly on medical benchmarks and exams, blaming the gap on a communication breakdown.
Unlike the simulated patient interactions often used to test AI, the real humans often did not give the chatbots all the relevant information.
And sometimes the humans struggled to interpret the options offered by the chatbot, or misunderstood or simply ignored its advice.
One out of every six US adults ask AI chatbots about health information at least once a month, the researchers said, with that number expected to increase as more people adopt the new technology.
"This is a very important study as it highlights the real medical risks posed to the public by chatbots," David Shaw, a bioethicist at Maastricht University in the Netherlands who was not involved in the research, told reporters.
He advised people to only trust medical information from reliable sources, such as the UK's National Health Service.
The researchers then recruited 1,298 participants in Britain to either use AI, or their usual resources like an internet search, or their experience, or the National Health Service website to investigate the symptoms and decide their next step.
When the participants did this, relevant conditions were identified in less than 34.5% of cases, and the right course of action was given in less than 44.2%, no better than the control group using more traditional tools.
Adam Mahdi, co-author of the paper and associate professor at Oxford, said the study showed the "huge gap" between the potential of AI and the pitfalls when it was used by people.
"The knowledge may be in those bots; however, this knowledge doesn't always translate when interacting with humans," he said, meaning that more work was needed to identify why this was happening.
The team studied around 30 of the interactions in detail, and concluded that often humans were providing incomplete or wrong information, but the LLMs were also sometimes generating misleading or incorrect responses.
For example, one patient reporting the symptoms of a subarachnoid haemhorrhage - a life-threatening condition causing bleeding on the brain - was correctly told by AI to go to hospital after describing a stiff neck, light sensitivity and the "worst headache ever". The other described the same symptoms but a "terrible" headache, and was told to lie down in a darkened room.
The study was supported by the data company Prolific, the German non-profit Dieter Schwarz Stiftung, and the UK and US governments.