DOI
Source Code
Data
Projects
Share
Methods for Proteogenomics Data Analysis, Challenges, and Scalability Bottlenecks: A Survey

Tariq, Muhammad Usman; Haseeb, Muhammad; Aledhari, Mohammed; Razzak, Rehma; Parizi, Reza M; Saeed, Fahad; , IEEE IEEE Access (2021).

Abstract

Big Data Proteogenomics lies at the intersection of high-throughput Mass Spectrometry (MS) based proteomics and Next Generation Sequencing based genomics. The combined and integrated analysis of these two high-throughput technologies can help discover novel proteins using genomic, and transcriptomic data. Due to the biological significance of integrated analysis, the recent past has seen an influx of proteogenomic tools that perform various tasks, including mapping proteins to the genomic data, searching experimental MS spectra against a six-frame translation genome database, and automating the process of annotating genome sequences. To date, most of such tools have not focused on scalability issues that are inherent in proteogenomic data analysis where the size of the database is much larger than a typical protein database. These state-of-the-art tools can take more than half a month to process …