Popular Mechanics

Why Geneticists Really, Really Hate Microsoft Excel

Popular Mechanics logo Popular Mechanics 28/08/2020 22:36:00 Courtney Linder
a close up of a sign: Genetic scientists have changed the names for certain genes that don't play well with Excel. © Microsoft Corporation/Public domainGenetic scientists have changed the names for certain genes that don't play well with Excel.
  • Scientists at the HUGO Gene Nomenclature Committee have changed the names for certain genes that don't play well with Excel.
  • That's because some genes, like SEPT1, auto-format to dates in Excel spreadsheets (the committee renamed that gene SEPTIN1, for example).
  • They outline the new standards in an article published in the journal Nature Genetics.

Human genes and Excel data entries simply don't get along.

According to the Human Genome Project, we each have between 20,000 and 25,000 genes inside our bodies, which constitute our physical characteristics. Because we have so many genes, scientists give each a unique name, according to the National Institutes of Health.

DIVE DEEPER ? Read best-in-class health, tech, and science features, and get unlimited access to Pop Mech, starting now.

But the nomenclature can be lengthy and technical, so researchers routinely shorten them with an abbreviated version, called a symbol. A gene on chromosome 7 that has been associated with fibrosis-cystic fibrosis transmembrane conductance regulator-becomes CFTR, for example.

Just one problem: Excel does not play nice with certain gene symbols, converting them into dates. That's extremely problematic, as researchers must be able to share massive amounts of data. They can't turn off auto-formatting options, and even changing the data type for certain columns can still introduce errors.

Experts with the HUGO Gene Nomenclature Committee (HGNC)-the standards organization for naming genes, based in Hinxton, England-have had enough. They've published an article in the journal Nature Genetics, outlining a new set of rules for naming certain genes (and the corresponding proteins they express) that give rise to data entry errors.

"Standardized gene naming is crucial for effective communication about genes, and as genomics becomes increasingly important in health care, the need for a consistent language to refer to human genes becomes ever more essential," the authors write.

These updates include changes to all symbols that auto-convert to dates in Excel. For example, SEPT1-a gene found in human chromosome 16, which encodes a protein that may contribute to the neurofibrillary tangles associated with Alzheimer's-now becomes SEPTIN1. MARCH1, another protein encoding gene found in chromosome 4, is now MARCHF1.

To date, HGNC has changed 27 gene names in this manner, Elspeth Bruford, the coordinator of HGNC, tells The Verge. "We consulted the respective research communities to discuss the proposed updates, and we also notified researchers who had published on these genes specifically when the changes were being put into effect," she says.

This issue has been an ongoing headache for genomics researchers for some time. An August 2016 study published in the journal Genome Biology found that Microsoft Excel and other similar spreadsheet programs have caused about 20 percent of genetics papers to contain errors.

Researchers downloaded and screened supplementary files from over 35,000 papers across 18 journals between 2005 and 2015. Of those, about 3,600 used Excel spreadsheets to provide a list of the genes referenced in the notation. One in five of those contained at least one error. This problem, according to the paper, dates back to 2004.

"Inadvertent gene symbol conversion is problematic because these supplementary files are an important resource in the genomics community that are frequently reused," the authors wrote. "Our aim here is to raise awareness of the problem."

Now that HGNC has finally resolved these issues with updated nomenclature, geneticists should be able to sleep better at night. Still, it sounds like a whole lot of trouble to go through for some spreadsheets, right? Why couldn't Excel simply stop auto-formatting the genes into dates?

Human gene names may change, but Excel nightmares will live on forever.

samedi 29 août 2020 01:36:00 Categories: Popular Mechanics

ShareButton
ShareButton
ShareButton
  • RSS

Suomi sisu kantaa
NorpaNet Beta 1.1.0.18818 - Firebird 5.0 LI-V6.3.2.1497

TetraSys Oy.

TetraSys Oy.