Creator of ChatGPT OpenAI presented Whisper two years ago as an AI tool that transcribes speech to text. Now, the tool USE by healthcare company AI Nabla and its 45,000 physicians to help transcribe medical conversations at over 85 organizations, such as University of Iowa Health Care.
However, new research shows that Whisper has been “hallucinating,” or adding statements that no one said, to the transcripts of the conversations, raising the question of how soon medical facilities must adopt AI if it gives errors.
According to Associated Pressa University of Michigan researcher found hallucinations in 80% of Whisper transcriptions. An unnamed developer found hallucinations in half of more than 100 hours of transcriptions. Another engineer found inaccuracies in nearly all of the 26,000 transcripts they created with Whisper.
Erroneous transcriptions of conversations between doctors and patients can have “really serious consequences,” Alondra Nelson, a professor at the Institute for Advanced Study in Princeton, NJ, told the AP.
“No one wants a misdiagnosis,” Nelson said.
Earlier this year, researchers at Cornell University, New York University, the University of Washington and the University of Virginia published a STUDY that tracked how many times OpenAI's speech-to-text service Whisper hallucinated when it had to transcribe 13,140 audio segments with an average length of 10 seconds. Audio sourced from TalkBank's Afasia Banka database featuring the voices of people with aphasiaa language disorder that makes communication difficult.
The researchers found 312 instances of “hallucinated phrases or whole sentences that did not exist in any form in the underlying audio” when they ran the experiment in the spring of 2023.
Among the hallucination transcripts, 38% contained harmful language, such as violence or stereotypes, that did not fit the context of the conversation.
“Our work shows that there are serious concerns about the inaccuracy of Whisper due to unpredictable hallucinations,” the researchers wrote.
The researchers say the study may also imply a hallucination bias in Whisper, or a tendency for it to introduce inaccuracies more often for a certain group — and not just people with aphasia.
“Based on our findings, we suggest that this type of hallucination bias may also arise for any demographic group with more irregular speech impairments (such as speakers with other speech impairments such as dysphonia (voice disorders ), the elderly or non-native speakers),” the researchers said.
Related: OpenAI reportedly used more than a million hours of YouTube videos to train its latest AI model
Whisper has transcribed seven million medical conversations through Nabla, for The Verge.