Microsoft Excel errors mar one-fifth of science papers on gene research

Over-zealous date conversion to blame

As many as 20 per cent of science papers on the subject of gene research have been affected by errors caused by Microsoft's Excel spreadsheet.

According to the report published in BioMed Central, the errors were largely caused by Excel's obsession with converting numbers into dates.

The researchers, Mark Ziemann, Yotam Eren and Assam El-Osta, suggested that the errors are easily made when using Microsoft Excel's default settings.

"The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers," said the report.

"A programmatic scan of leading genomics journals reveals that approximately one fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions."

The problem has been rife since 2004, just three years before it emerged that Excel couldn't cope with anything involving the number 65,535.

Many of the problems involve simple misreading of information such as dates. The report said that gene symbols SEPT2 (Septin 2) and MARCH1 (Membrane-Associated Ring Finger) are converted by Excel into 2-Sep and 1-Mar, thus completely removing their point.

RIKEN identifiers are automatically converted by Excel into floating point numbers, the researchers found, for example from accession 2310009E13 to 2.31E+13. SEPT2 has been converted into 2006/09/02.

"Supplementary files in Excel format from 18 journals published from 2005 to 2015 were programmatically screened for the presence of gene name errors. We screened 35,175 supplementary Excel files, finding 7,467 gene lists attached to 3,597 published papers," concluded the report.