DOI QR코드

DOI QR Code

A Polymorphism Analysis and Visualization Tool for Specific Variation Pattern Identification in Groups of Nucleotide Sequences

특정변화패턴 식별을 위한 염기서열 집단간의 다형성 분석 및 시각화 도구

  • Lee, Il Seop (Department of Computer Science, Chungbuk National University) ;
  • Lee, Keon Myung (Department of Computer Science, Chungbuk National University)
  • 이일섭 (충북대학교 소프트웨어학과) ;
  • 이건명 (충북대학교 소프트웨어학과)
  • Received : 2018.10.16
  • Accepted : 2018.12.20
  • Published : 2018.12.31

Abstract

A genome contains all genetic information of an organism. Within a specific species, unique traits appear for each individual, which can be identified by analyzing nucleotide sequences. Many Genome-Wide Associations Studies have been carried out to find genetic associations and cause of diseases from slightly different base among the individuals. It is important to identify occurrence of slight variations for polymorphisms of individuals. In this paper, we introduce an analysis and visualization tool for specific variation pattern identification of polymorphisms in nucleotide sequences and show the validity of the tool by applying it to analyzing nucleotide sequences of subcultured pOka strain of varicella-zoster virus. The tool is expected to help efficiently explore allele frequency variations and genetic factors within a species.

유전체는 생명체가 가지고 있는 모든 유전적 정보를 담고 있다. 특정 종 내에서는 개체별로 고유의 특성이 나타나며, 이 특성은 유전체의 염기서열 분석을 통해 확인할 수 있다. 종내 개체들 사이에 조금씩 다른 염기에 대해 유전적 연관성을 규명 짓고, 더 나아가 질병과의 연관성을 찾는 전장유전체 연관분석 연구가 많이 진행되고 있다. 종 내의 조금씩 발생하는 염기변이를 파악하는 것은 개체의 다형성을 파악하기 위해 중요하다. 이 논문에서는 종 내 여러 개체의 염기서열에서 대립형질 빈도의 특정변화패턴을 쉽게 파악할 수 있는 분석 및 시각화 도구를 제안한다. 그리고 수두 대상포진 바이러스의 계대 배양한 pOka strain 염기서열 데이터를 이용해 실험하여 분석과 시각화의 실용성을 보인다. 본 제안도구를 통해 종 내의 대립형질 빈도의 변화를 탐색하고 유전적 요인을 찾는 연구효율의 증진을 기대할 수 있다.

Keywords

JKOHBZ_2018_v8n6_201_f0001.png 이미지

Fig. 1. Manhattan plot for GWAS Genomic coordinates are displayed along the X-axis, with the negative logarithm of the association p-value for each SNP displayed on the Y-axis

JKOHBZ_2018_v8n6_201_f0002.png 이미지

Fig. 2. System configuration for the tool System configuration consists of input module to read in the sequences aligned, GUI for entering the information of input file, and polymorphisms analyzer and visualization.

JKOHBZ_2018_v8n6_201_f0003.png 이미지

Fig. 3. Graphic User Interface of polymorphisms analyzer A user can input the file of aligned nucleotide sequences and genetic information of it.

JKOHBZ_2018_v8n6_201_f0004.png 이미지

Fig. 5. Visualization snapshots for prominent variations in nucleotide sequences Each graphs shows variations of MAF aligned with the position of sequences between individuals. X-axis means the position of sequences and Y-axis of each graph means variation of MAF.

JKOHBZ_2018_v8n6_201_f0005.png 이미지

Fig. 4. Snapshots of visualization components Each base A, G, C, T is identified by colors. Increases and decreases are identified by the direction of brightness changes. Each components is consist of Major placed ends of variation and Minor changes brightness. Minor transition can be identified by changes of base color.

JKOHBZ_2018_v8n6_201_f0006.png 이미지

Fig. 6. Strain polymorphisms analysis results of Genome structure and Repeat region in MAF 5-15% This is the strain polymorphisms analysis(Section 3.2.1) results in MAF 5-15%. It shows an increasing patterns in UL of genome structure and ORF over the passages.

JKOHBZ_2018_v8n6_201_f0007.png 이미지

Fig. 7. Strain polymorphisms analysis results of Major/Minor combinations in MAF 5-15% This is the strain polymorphisms analysis(Section 3.2.1) results in MAF 5-15%. It shows an increasing patterns in A/g and T/c combinations.

JKOHBZ_2018_v8n6_201_f0008.png 이미지

Fig. 8. Bases filtering on major type and MAF variations. This shows all major types are filtered except for the type of Major T and MAF changed more than 10%.

JKOHBZ_2018_v8n6_201_f0009.png 이미지

Fig. 9. Bases filtering on the position of sequences with genetic information expression This shows sequences filtered by the position from 101396 to 124874 with ORF section and details of specific position.

References

  1. S. C. Schuster. (2007). Next-generation sequencing transforms today's biology. Nature Methods, 5(1), 16-18. DOI : 10.1038/nmeth1156
  2. E. Pennisi. (2012). ENCODE Project Writes Eulogy for Junk DNA. Science, 337(6099), 1159-1161 DOI : 10.1126/science.337.6099.1159
  3. A. Park. (2012. Sep). Junk DNA - Not So Useless After All. TIME.
  4. D. Vendramini. (2004). Noncoding DNA and Teem Theory of Inheritance, Emotions and Innate Behavior. Medical Hypotheses 64(3), 512-519. DOI : 10.1016/j.mehy.2004.08.022
  5. The International SNP Map Working Group. (2001). A map of human genome sequence variation containing 1.42 million single nucelotide polymorphisms. Nature 409, 928-933. https://doi.org/10.1038/35057149
  6. The International HapMap Consortium. (2005). A haplotype map of the human genome. Nature. 437(7063), 1299-1320. DOI : 10.1038/nature04226
  7. Q. Chen & F. Sun. A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms. BMC Genomics, 14(1). DOI : 10.1186/1471-2164-14-S1-S1
  8. L. B. Barreiro, G. Laval, H. Quach, E. Patin & L. Quintana-Murci. (2008). Natural selection has driven population differentiation in modern humans. Nature Genetics. 40(3), 340-345. DOI : 10.1038/ng.78
  9. O. Harismendy et al. (2010). Population sequencing of two endocannabinoid metabolic genes identifies rare and common regulatory variants associated with extreme obesity and metabolite level. Genome Biology, 11(11), R118. DOI : 10.1186/gb-2010-11-11-r118
  10. T. A. Manolio. (2010). Genomewide association studies and assessment of the risk of disease. The New England Journal of Medicine. 363(2), 166-76. DOI : 10.1056/NEJMra0905980
  11. S. Raychaudhuri et al. (2009). Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions. PLoS Genetics, 5(6), e1000534. DOI : 10.1371/journal.pgen.1000534
  12. B. Bandyopadhyay, V. Chanda & Y. Wang. (2017). Finding the Sources of Missing Heritability within Rare Variants Through Simulation. Bioinform Biol Insights. DOI : 10.1177/1177932217735096
  13. D. B. Goldstein, A. Allen, J. Keebler, E. H. Margulies, S. Petrou, S. Pertrovski & S. Sunyaev. (2013). Sequencing studies in human genetics: design and interpretation. Nature Reviews Genetics, 14(7), 460. DOI : 10.1038/nrg3455
  14. C. S. Carlson, M. A. Eberle, L. Kruglyak & D. A. Nickerson. (2004). Mapping complex disease loci in whole-genome association studies. Nature, 429(6990), 446-452. DOI : 10.1038/nature02623
  15. B. Carlson. (2008. June). SNPs - A Shortcut to Personalized Medicine. Genetic Engineering & Biotechnology News, 28(12).
  16. R. Chenna et al. (2003). Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research, 31(13), 3497-500. DOI : :10.1093/nar/gkg500
  17. T. A. Manolio. (2010). How to interpret a genome-wide association study. JAMA. 363(2), 166-76. DOI : 10.1001/jama.299.11.1335
  18. A. J. Davison & J. E. Scott. (1986). The complete DNA sequence of varicella-zoster virus. Journal of General Virology, 67(Pt9), 1759-1816. DOI : 10.1099/0022-1317-67-9-1759