BBC releases damning research into AI news accuracy

More than half of answers were judged to have “significant issues”

Image:

Research shows AI generated news summaries are highly error prone

BBC research finds inaccuracies and distortions in more than half of AI generated answers it tested on its news stories.

The BBC has published the results of research it conducted on four of the most popular AI assistants – OpenAI's ChatGPT, Microsoft’s Copilot, Google’s Gemini and Perplexity AI.

The research consisted of giving each chatbot 100 BBC news stories and asking the bots to summarise the content and answer questions on it. Journalists who are experts in the subjects of the articles were asked to rate the quality of answers.

The results, according to the BBC contained “significant inaccuracies” and distortions.

51% of all AI given answers were judged to have significant issues, and 19% which cited BBC content contained factual mistakes including incorrect statements, numbers and dates.

13% of the quotes sourced from BBC articles were either altered from the original source quote or were not present in the cited article at all.

In a blog post, Deborah Turness, CEO of BBC News and Current Affarirs expressed her concern about the findings of the study. She said:

We live in troubled times, and how long will it be before an AI-distorted headline causes significant real-world harm?”

The BBC has already experienced its news content being distorted and flat out contradicted by other AI powered summaries. Apple was forced to shelve the news summary feature in Apple Intelligence earlier this year because the feature was generating incorrect alerts. A summary of a BBC story made it look as if the BBC had reported that Luigi Mangione, the man accused of killing United Heathcare CEO Brian Thompson, had shot himself.

In her blog post, Ms Turness called on the tech companies to follow Apple’s example and pull back their summaries, although she was also careful to emphasise that she wanted to open a dialogue with AI companies and “work in partnership to find solutions.”

The BBC carried out the testing last December. Some examples given of errors found included Gemini stating that the NHS did not recommend vaping as a smoking cessation aid, ChatGPT and CoPilot both claiming that Rishi Sunak and Nicola Sturgeon were still in office and Perplexity misquoting BBC News in a story on the Middle East.

In addition to factual mistakes, the chatbots "struggled to differentiate between opinion and fact, editorialised, and often failed to include essential context".

The BBC's Programme Director for Generative AI, Pete Archer, said publishers "should have control over whether and how their content is used, and AI companies should show how assistants process news along with the scale and scope of errors and inaccuracies they produce".

An OpenAI spokesperson told the BBC: "We support publishers and creators by helping 300 million weekly ChatGPT users discover quality content through summaries, quotes, clear links, and attribution."

OpenAI spokesperson told BBC News: "We've collaborated with partners to improve in-line citation accuracy and respect publisher preferences, including enabling how they appear in search by managing OAI-SearchBot in their robots.txt. We'll keep enhancing search results."