HPV-type Prediction System using SVM and Partial Sequential Pattern

Kim, Jinsu;

doi:10.14400/JDC.2014.12.12.365

디지털융복합연구 (Journal of Digital Convergence)

제12권12호
/
Pages.365-370
/
2014
/
2713-6434(pISSN)
/
2713-6442(eISSN)

한국디지털정책학회 (The Society of Digital Policy and Management)

DOI QR Code

분할 순차 패턴과 SVM을 이용한 HPV 타입 예측 시스템

HPV-type Prediction System using SVM and Partial Sequential Pattern

김진수 (안양대학교 교양대학)

Kim, Jinsu (College of Liberal Arts, Anyang University)

투고 : 2014.09.05
심사 : 2014.12.20
발행 : 2014.12.28

https://doi.org/10.14400/JDC.2014.12.12.365 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

기존의 시스템에서는 서열 전체 혹은 정렬되지 않은 서열로부터 패턴들을 생성하기 때문에 패턴의 수가 기하급수적으로 증가하여 많은 시간과 비용이 소모된다. 본 논문에서는 단백질의 전체 서열로부터 패턴을 찾아내는 것이 아니라, 다중 서열 정렬 기법을 이용하여 단백질의 분할 서열 구간을 생성하고 분할 서열 구간의 순차 패턴을 생성하며 생성된 패턴들을 통합하여 전체 모티프 후보 집합을 만들어 SVM의 훈련 집합으로 선택 및 학습하며, 최종적으로 미지의 혹은 알려진 단백질 서열의 HPV 타입을 SVM을 통해 학습된 정보를 적용하여 예측하는 시스템을 제안한다. 제안된 시스템은 기존의 시스템에 비해 최소 지지도 30%에서 정확도와 재현율 측면에서 보다 향상된 성능을 보였다.

The existing system consumes a considerable amount time and cost for extracting the patterns from whole sequences or misaligned sequences. In this paper, We propose the classification system, which creates the partition sequence sections using multiple sequence alignment method and extracts the sequential patterns from these section. These extracted patterns are accumulated motif candidate sets and then used the training sets of SVM classifier. This proposed system predicts a HPV-type(high/low) using the learned knowledges from known/unknown protein sequences and shows more improved precision, recall than previous system in 30% minimum support.

키워드

참고문헌

Bailey, Timothy L., et al., Article: Data Mining Techniques for Informative Motif Discovery. International Journal of Computer Applications, Vol. 88, No. 12, pp. 21-24, 2014. https://doi.org/10.5120/15405-3901
Rashida Hasan and Jainal Uddin, Motif discovery in biological sequences without alignment or enumeration. In Proceedings of the Second Annual International Conference on Computational Molecular Biology, RECOMB 98 pp. 221-227, 1998.
Vance Chiang-Chi Liao, and Ming-Syan Chen, Efficient mining gapped sequential patterns for motifs in biological sequences. BMC Syst Biol. 7(Suppl 4), pp. S7, 2013.
Bernhard E. Boser, Isabelle Guyon, and Vladimir Vapnik, A training algorithm for optimal margin classifiers. In Computational Learing Theory, pp. 144-152, 1992.
V. N. Vapnik. Statistical Learning Theory. Springer, 1998.
Su-Hyeon Namn, Hong-Kee Kim, Knowledge Extraction from Academic Journals Using Data Mining Techniques. The Journal of Digital Policy & Management, Vol. 3, No. 1, pp. 75-88, 2005.
R. Agrawal and R. Srikant, "Mining Sequential Pattern, Proc. of the Int'l Conference on Data Engineering (ICDE), Taipei, Taiwan, 1995.
Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, and Higgins DG, Clustal W and Clustal X version 2.0. Bioinformatics, Vol.23, pp. 2947-2948, 2007. https://doi.org/10.1093/bioinformatics/btm404
J.D. Thompson, D.G. Higgins and T.j. Gibson, ClustalW: Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting. Nucleic Acids Research, Vol. 22, No. 22, pp. 4673-4680, 1994. https://doi.org/10.1093/nar/22.22.4673
Yang, Y., and Liu, X., A Re-examination of Text Categorization Methods. In Proceedings of ACM SIGIR'99 conference, pp. 42-49, 1999.

디지털융복합연구 (Journal of Digital Convergence)

분할 순차 패턴과 SVM을 이용한 HPV 타입 예측 시스템

HPV-type Prediction System using SVM and Partial Sequential Pattern

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)