Data is said to be the oil of the 21st century and the ability to analyze and interpret it is one of today's major economic forces. Industries such as medicine are already seeing the benefits of taking advantage of data. In particular, the enormous amount of information about patients' genomes is bringing unprecedented opportunities for new research, treatment, and early diagnosis. It is estimated that between two exabytes (EB) and 40 exabytes of genomic data per year will be generated by 2025. This is far more than what will be produced in astronomy (one EB per year) and YouTube (between one EB and two EBs per year).
These figures will turn the genomics industry into one of the sectors that will require the greatest data processing capacity and all the aspects that this implies: storage, exchange, and analysis. Processing such vast amounts of information is becoming a challenge that hinders the progress of genomics. To facilitate the management and compression of these files, researcher Idoia Ochoa has developed new digital formats that store, transmit, visualize, and analyze genome-related data. Thanks to this project, the young Spanish woman has been selected as one of the winners of Innovators Under 35 Europe from MIT Technology Review.
Ochoa is an adjunct professor in the Department of Electrical and Computer Engineering at the University of Illinois Urbana-Champaignx (USA) and, as part of her research, she proposed a solution to end the imperfections of next generation sequencing technologies (NGS). Currently, raw genome files from NGS techniques occupy hundreds of gigabytes (GB), a volume that makes it difficult to read and detect variants in the genome. Thus, the Ochoa algorithms facilitate the reading of the genome and preserve the quality of the data. QualComp, QVZ, GeneComp, AliCo, FaStore, and SPRING are some of the examples of these algorithms; the latter of these is able to reduce the raw data of the genome from 196 gigabytes GB to just seven GB.
The reduction of latency in data transmission, the flexibility to create data storage methodologies adaptable to each system, and the possibility of accessing accurate data are other benefits of Ochoa's solutions. These aspects ultimately allow us to analyse each patient's genome and promote personalized medicine.
Developing algorithms that facilitate compression is not enough, according to the researcher, it is also crucial that any institution is capable of using them. To achieve this, she is working with a group of experts to develop a standard for genomic representation under the ISO MPEG-G standard. In this way, their developments will be compatible with existing tools and infrastructures, facilitate data exchange between healthcare providers, and contribute to genome analysis being increasingly present in the medical community.
"The standard includes a series of specifications that determine how genome data will be represented," says Ochoa. In this phase, the innovator has intervened, above all so that the formats occupy little space and at the same time facilitate access to several parts of data quickly and effectively. "The first version of the standard has already been produced," says Ochoa, adding that they are awaiting approval for the standard to be used. At the moment, companies such as Roche, Genapsys, and Gaurdan Health have already shown interest in the developed compressors.
Livindas Oy's CEO and member of the Innovators Under 35 Europe 2019 jury, Tiina Hynninen, assures that Ochoa's project "has a great impact, shows ingenuity, is very timely, and will help the society of the future to improve decision making and medical treatments."
By Alba Casilda
Translation: Brian Bostwick