Search feasibility in distributed MS-proteomics big data
Mohammad,
Umair; Saeed,
Fahad; ,
:1-1
(2021).
Abstract
Making large-scale Mass Spectrometry (MS) data FAIR (Findable, Accessible, Interoperable, Reusable) and democratizing access for the omics research community requires advance access and reuse mechanisms. In this work, we proposed a novel distributed data access infrastructure and developed a simulation test-bed to show the feasibility of this solution. In contrast to existing centralized approaches, participating nodes are relied upon to execute the search algorithm and search based on the comparison of raw spectra is supported as opposed to simple meta-data based searches. Simulation results using networking, stochastic modelling, and queuing theory, illustrated that search times were reduced by up-to 600 times for up-to a total of fifty billion spectra. Proteomics is vital because of the importance proteins to life and their role in state-of-the-art medicine such as custom drug delivery and cancer treatment …