| RSS
Business center
Office
Post trade leads
Post
Rank promotion
Ranking
 
You are at: Home » News » International »

A new algorithm ProVerB based on a novel binominal distribution statistical model

Increase font size  Decrease font size Date:2015-12-04   Views:435

Soft ionization techniques, e.g. matrix-assisted laser desorption ionization (MALDI) and electrospray ionization (ESI) are able to maintain the integrity of peptides, thus empowering the mass spectrometry (MS) methods to perform proteomic analysis. Protein identification is the most fundamental algorithm in the data processing pipeline, since the sensitivity and accuracy of the identification algorithm is crucial for downstream analyses. Generally, a peptide identification algorithm selects some peaks from the spectra, evaluates the similarity between the experimental and theoretical spectra, and then assigns the best match within the peptide error window as the result. The scoring models that evaluate the similarity between experimental and theoretical spectra should consider three aspects: the number of peak matches, the number of peak consecutive matches, and the intensities of matched peaks.

A number of peptide identification algorithms with various concepts for MS data are available, e.g. Mascot, Sequest, OMSSA, X!Tandem, MassWiz, Andromeda, and SQID. Mascot and Sequest are widely used commercial software and commonly adapted search tools in protein identification; however, only limited details of these algorithms are released. Mascot is based on a probability model, whereas Sequest is based on an empirical scoring model that computes cross-correlation between experimental and theoretical spectra. Mascot selects the highest peak in each 14 Da mass interval and keeps the peaks with their intensities above the threshold. Sequest takes consecutive matches of ions and intensity information into account and then preprocesses the spectrum by keeping the top 200 peaks and separates the spectrum into ten bins for normalization. X!Tandem uses a hypergeometric scoring model, while OMSSA is based on a Poisson scoring model to assess the significance of peptide match. They select the 50 most intensive peaks by default. MassWiz divides the spectrum dynamically and takes a maximum of 5 most intense peaks from each bin. SQID keeps the top 80 peaks after deleting parent related peaks.

However, none of these algorithms accurately uses the entire information in MS experiments. They share similar methods to generate theoretical spectra. Considering six types of ions (b-, y-, b-H2O, b-NH3, y-H2O, and y-NH3) in CID (collision-induced dissociation) fragmentation mode, theoretical peak intensities are then set as three artificial values: 50 (b- and y-ions), 25 (b- and y-ions without H2O or NH3), and 10 (a-ions) for a theoretical spectrum that does not fully reflect the intensity characteristics of experimental spectra. Therefore, these algorithms do not use the peak intensity information obtained in the experiment to make the comparison of the experimental and theoretical spectra once the peaks are selected. The incomplete use of MS information compromises the sensitivity, robustness, and confidence of most of these algorithms. A recent algorithm, SQID, is attempting to address this issue by introducing the strength probability of the pairwise amino acid fragments to consider the intensity match quality.

To make full use of the MS information and to maximize the universality, we present here a novel identification algorithm, the protein verification algorithm based on the binomial probability distribution (ProVerB), to enhance the accuracy, completeness, and robustness of the peptide identification. We tested ProVerB against other algorithms using multiple MS data sets, showing its higher ability and confidence to identify peptides from the mass spectrometry at 1% FDR, significantly and stably higher than those for the widely used Mascot and Sequest.

The boom of the proteomics applications and the wide variety of mass spectrometry technology on peptide identification necessitate a versatile and accurate peptide identification algorithm. In this paper, we present a new algorithm ProVerB based on a novel binominal distribution statistical model, and we validate its accuracy, robustness, and compatibility. ProVerB is an open source program so that no algorithmic detail is hidden as in the commercial software packages. Users may tune the parameters according to their specific experimental setup to optimize the results. Also, it can be compiled in various operating systems with a user-friendly graphical user interface. Although ProVerB does not support ECD/ETD mass spectrometry data, we believe that ProVerB will find broad application in proteomics studies and provide more robust and accurate results than the currently available commercial algorithms, producing a more solid base of data for downstream analyses.

 
 
[ Search ]  [ ]  [ Email ]  [ Print ]  [ Close ]  [ Top ]

 
Total:0comment(s) [View All]  Related comment

 
Recomment
Popular
 
 
Home | About | Service | copyright | agreement | contact | about | SiteMap | Links | GuestBook | Ads service | 京ICP 68975478-1
Tel:+86-10-68645975           Fax:+86-10-68645973
E-mail:yaoshang68@163.com     QQ:1483838028