Cams-rs: clustering algorithm for large-scale mass spectrometry data using restricted search space and intelligent random sampling
Saeed,
Fahad; Hoffert,
Jason D; Knepper,
Mark A; ,
IEEE Computer Society Press IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
11
:128-141
(2014).
Abstract
High-throughput mass spectrometers can produce massive amounts of redundant data at an astonishing rate with many of them having poor signal-to-noise (S/N) ratio. These low S/N ratio spectra may not get interpreted using conventional spectra-to-database matching techniques. In this paper, we present an efficient algorithm, CAMS-RS (Clustering Algorithm for Mass Spectra using Restricted Space and Sampling) for clustering of raw mass spectrometry data. CAMS-RS utilizes a novel metric (called F-set) that exploits the temporal and spatial patterns to accurately assess similarity between two given spectra. The F-set similarity metric is independent of the retention time and allows clustering of mass spectrometry data from independent LC-MS/MS runs. A novel restricted search space strategy is devised to limit the comparisons of the number of spectra. An intelligent sampling method is executed on individual …