A Method for Protein Identification Based on MS/MS using Probabilistic Graphical Models

확률그래프모델을 이용한 MS/MS 기반 단백질 동정 기법

  • Li, Hong-Lan (School of Computer Science and Engineering, Soongsil University) ;
  • Hwang, Kyu-Baek (School of Computer Science and Engineering, Soongsil University)
  • Published : 2012.06.22


In order to identify proteins that are present in biological samples, these samples are separated and analyzed under the sequential procedure as follows: protein purification and digestion, peptide fragmentation by tandem mass spectrometry (MS/MS) which breaks peptides into fragments, peptide identification, and protein identification. One of the widely used methods for protein identification is based on probabilistic approaches such as ProteinProphet and BaysPro. However, they do not consider the difference in peptide identification probabilities according to their length. Here, we propose a probabilistic graphical model-based approach to protein identification from MS/MS data considering peptide identification probabilities, number of sibling peptides, and peptide length. We compared our approach with ProteinProphet using a yeast MS/MS dataset. As a result, our model identified 27 more proteins than ProteinProphet at 1% of FDR (false discovery rate), confirming the importance of peptide length information in protein identification.