Jimmy Wales: ChatGPT is no threat to Wikipedia - yet
'It just hallucinates way too much'
Large language models are not suitable for producing factual content, but a threat could come from pressure on copyright law, says Wikipedia founder.
"Everyone's always asking if ChatGPT is a threat to Wikipedia," said Jimmy Wales, speaking at an OpenUK press event last week.
"Our view is no it isn't, at least not right now, because it just simply isn't good enough yet."
ChatGPT produces plausible but often completely inaccurate responses to users' prompts, making it unsuitable for generating factual content, he said.
"We wouldn't use it for famous topics, we're good at that already, and for more obscure topics where it might seem to be helpful, it just hallucinates way too much, so it's not really useful."
That said, LLMs could add to the effectiveness of human contributors, especially in the time-consuming task of checking entries against references for accuracy and completeness, he went on.
The LLM could be instructed to read an entry then go through the cited references and search for content on the Wikipedia page that is not supported by the sources, or conversely things that are in the sources and really should be in the entry but are missing.
"If we have a human checking output and doing something about it, that could be a useful tool for a community member to go, 'okay, I want to improve this entry, give me some quick ideas of sentences I should look at or facts that I might want to include'."
Copyright minefield
Copyright with respect to LLMs is a hot issue. Wales estimates that 50% of the information used to train GPT-4 came from Wikipedia, which is in the public domain. He said he had no problem with this, but others don't see it the same way. US comedian Sarah Silverman and others are suing the of OpenAI and Meta for alleged infringement in the use of their work as input for their training models.
At the other end of the process, US courts recently ruled that the output of AI models cannot be patented since it is not the product of a human being, therefore it should be in the public domain.
The case lodged by Silverman and others is unlikely to succeed, according to Wales, and the law against copyrighting AI output will hold for now. But he has no doubt that around the world there will be pressure to change the legislation, particularly on the input side.
See also: Jimmy Wales: Online Safety Bill doesn't make sense
"We see a real danger in this idea that your copyright on something somehow means you own the facts in it, because that's never been the case for copyright," he said.
"And obviously, some of the scientific publishers like Elsevier would love to say, finally, you can stop using Wikipedia to read about scientific papers. We want to sell those instead."
On the output side, Wales said the sudden influx of new AI generated data in the public domain is going to be "really interesting. This huge volume of body of text is going to be not copyrightable."
On AI regulation
Wales said there is no easy solution to regulating AI, but added that in his opinion the EU's approach to is misguided, since it is aimed at "only a handful of big tech giants."
It overlooks the open source models which are progressing rapidly and are in many cases comparable with those developed by the big names, and is overly prescriptive, he said, predicting it will be overtaken by events.
"In my view, the EU AI act is a classic piece of European legislation that is going to leave Europe behind and the US completely dominant."