A limit of one million rows for Microsoft Excel spreadsheet software could have caused a major data loss to Public Health England of about 16,000 test results for Covid-19.
The data error, which led to 15,841 positive tests, was left out of official daily figures, meaning that more than 50,000 potentially infectious people could have been missed by the contact tracer and were not told to isolate themselves.
How did the incident take place?
PHE is responsible for gathering test results from public and private laboratories and publishing daily updates on the number of cases and tests performed.
But the rapid development of the testing program meant that much of the work is still done manually, with individual labs sending spreadsheets containing results to PHE.
Although the system has improved since the early days of the pandemic, when some work was done with phone calls, pens and paper, it is still far from automated.
In this unfortunate case, a lab sent its daily test report to PHE as a CSV file – the simplest possible database format, including a comma-separated list of values.
That report was then uploaded to Microsoft Excel, and the new tests at the bottom were added to the main database.
But while CSV files can be any size, Microsoft Excel files can only have 1,048,576 long rows – or, in older versions that PHE could still use, only 65,536.
When a longer CSV file is opened, the bottom rows are paused and no longer displayed. This means that once the lab performed more than a million tests, it was only a matter of time before its reports could be read by PHE.
Excel is ubiquitous, so the problem is likely to spread
Microsoft spreadsheet software is one of the most popular business tools in the world, but it is regularly involved in errors that can be costly or even dangerous due to the ease with which it can be used.
In 2013, an Excel error at JPMorgan masked a loss of nearly $ 6 billion after a cell erroneously split the amount of two interest rates rather than the average.
The news prompted James Kwak, a law professor at the University of Connecticut, to warn that Excel is “incredibly fragile.”
“There is no way to track where the data comes from, there is no audit trail, so you can write too many numbers and you don’t know.
The biggest problem is that anyone can create Excel spreadsheets, and anyone can do it wrong. Because it is so easy to use, creating even important spreadsheets is not limited to people who understand programming and do it in a methodical, well-documented way, ”Kwak wrote.
Errors in Excel software have changed the very foundations of human genetics. The names of 27 genes have been changed in the last year by the Human Gene Nomenclature Committee, after the Microsoft program repeatedly formatted them incorrectly.
The SEPT1 and MARCH1 genes, for example, were changed to SEPTIN1 and MARCHF1 after being repeatedly transformed into the names of the months of the year.
Names that could be confused with other words have also been changed so that grammar tools no longer automatically correct them: WARS is now WARS1, for example.