AI search engines plagued by inaccuracy

Chatbots provide incorrect answers to more than 60% of queries, finds study

A recent study by the Tow Center for Digital Journalism has revealed alarming inconsistencies and inaccuracies in AI search tools, challenging their growing popularity as replacements for traditional search engines.

The study found that major AI search engines frequently fabricate reference links, fail to provide sources when requested, and deliver incorrect information, particularly when citing news articles.

"Overall, the chatbots provided incorrect answers to more than 60% of queries," the study states.

The research analysed eight AI search tools, including ChatGPT Search, Gemini, Perplexity, Perplexity Pro, DeepSeek Search, Microsoft's Copilot, Grok-2 Search and Grok-3 Search.

It included 200 randomly selected news articles from 20 different news publishers. Researchers ensured each article appeared within the top three Google search results when using an exact excerpt from the story.

They then ran the same query through each AI search engine and graded the results based on whether they correctly cited the article, the news organisation, and the URL.

The AI-generated responses were classified on a scale from "completely correct" to "completely incorrect."

Only Perplexity and Perplexity Pro performed at a relatively acceptable level. The rest failed at an alarming rate, with some AI tools confidently reinforcing misinformation.

X's Grok-3 Search was incorrect 96% of the time.

Microsoft's Copilot also fared poorly, refusing to answer 104 out of 200 queries. Among the 96 questions it did respond to, only 16 were "completely correct," 14 were "partially correct," and 66 were "completely incorrect," giving it an overall inaccuracy rate of about 70%.

ChatGPT Search, while one of the more responsive AI tools, also struggled with accuracy. It provided answers for all 200 queries but only achieved a "completely correct" rating 28% of the time, while it was "completely incorrect" 57% of the time.

The study supports ongoing concerns that AI models not only fabricate information but do so with unwavering confidence. These so-called "hallucinations" are an acknowledged flaw in large language models (LLMs), but the extent to which they occur in AI search engines is now quantifiably evident.

This issue was highlighted in a 2023 article by Ted Gioia of The Honest Broker, where he documented ChatGPT's tendency to generate incorrect information with complete certainty.

Even when the AI admitted to being wrong, it would sometimes follow up with more false claims.

As per researchers, AI companies continue to charge users premium prices for access to their tools without disclosing their inaccuracy rates. Subscription fees range from $20 to $200 per month, yet even paid versions, such as Perplexity Pro ($20/month) and Grok-3 Search ($40/month), had higher error rates than their free counterparts.

The study also revealed that several chatbots bypassed Robot Exclusion Protocol preferences, accessing content from publishers that had explicitly blocked their crawlers. Perplexity Pro, for example, correctly identified nearly a third of excerpts from articles it should not have had access to.

Licensing deals with news sources did not guarantee accurate citation in chatbot responses. Despite partnerships between AI companies and publishers, the study found a wide range of accuracy in responses related to partner publishers.

The Tow Center's research, published in the Columbia Journalism Review, warns that the perception of AI as a shortcut to knowledge, particularly among younger users, could lead to a generation ill-equipped with research and analytical skills.

The study calls for a shift in perspective, advocating for AI to be understood as a tool for extending human capabilities, rather than replacing them.