Machine-learning techniques often produce misleading or wrong results, researcher warns
Algorithms don't know how to say 'the data is not clear' or 'I don't know'
Increasing use of machine learning techniques in scientific research is leading to a 'science crisis', claims Rice University statistician Dr Genevera Allen, who has found that the results produced by machine learning algorithms are often misleading or wrong.
Machine-learning techniques are now being used by thousands of scientists to analyse huge amounts of data, especially in the fields of astronomy and biomedical science.
However, Dr Allen suggests that researchers must keep questioning the reproducibility of the predictions or the findings made by machine learning techniques until new computational systems are developed, which are able to critique their own results.
Dr Allen warns that using these techniques in scientific research, without improving them, is a waste of time as well as money.
Machine learning techniques involve application of artificial intelligence to provide computer systems the capability to automatically learn by experience, without human intervention or support. The process of learning starts with input of data or observations to the system; the system then searches for patterns to make improved decisions for future.
A major issue with machine learning techniques is that they don't know how to say "the data is not clear" or "I don't know".
Generally, they will always produce an answer, which may not be as accurate or perfect as it is thought to be by the researchers. A machine learning technique will always try to find a pattern in the data, even if it is present only marginally, which may or may not actually hold in the real world.
"Often these studies are not found out to be inaccurate until there's another real big dataset that someone applies these techniques to and says 'Oh my goodness, the results of these two studies don't overlap'," Dr Allen said.
"There is general recognition of a reproducibility crisis in science right now," she added.
Dr Allen, whose research was recently presented at the American Association for the Advancement of Science in Washington, is working on the next generation of machine learning and statistical techniques, in collaboration with a team of researchers from Baylor College of Medicine in Houston.
Dr Allen claimed these new techniques will make discoveries after examining huge amounts of data, and will also state how much reproducible or uncertain the results are.
"Collecting these huge data sets is incredibly expensive. And I tell the scientists that I work with that it might take you longer to get published, but in the end your results are going to stand the test of time," Dr Allen told BBC News.
"It will save scientists money and it's also important to advance science by not going down all of these wrong possible directions," she added.