• Title/Summary/Keyword: Giga-sequencing

Search Result 4, Processing Time 0.019 seconds

CNVR Detection Reflecting the Properties of the Reference Sequence in HLA Region (레퍼런스 시퀀스의 특성을 고려한 HLA 영역에서의 CNVR 탐지)

  • Lee, Jong-Keun;Hong, Dong-Wan;Yoon, Jee-Hee
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.6
    • /
    • pp.712-716
    • /
    • 2010
  • In this paper, we propose a novel shape-based approach to detect CNV regions (CNVR) by analyzing the coverage graph obtained by aligning the giga-sequencing data onto the human reference sequence. The proposed algorithm proceeds in two steps: a filtering step and a post-processing step. In the filtering step, it takes several shape parameters as input and extracts candidate CNVRs having various depth and width. In the post-processing step, it revises the candidate regions to make up for errors potentially included in the reference sequence and giga-sequencing data, and filters out regions with high ratio of GC-contents, and returns the final result set from those candidate CNVRs. To verify the superiority of our approach, we performed extensive experiments using giga-sequencing data publicly opened by "1000 genome project" and verified the accuracy by comparing our results with those registered in DGV database. The result revealed that our approach successfully finds the CNVR having various shapes (gains or losses) in HLA (Human Leukocyte Antigen) region.

A CNV detection algorithm based on statistical analysis of the aligned reads (정렬된 리드의 통계적 분석을 기반으로 하는 CNV 검색 알고리즘)

  • Hong, Sang-Kyoon;Hong, Dong-Wan;Yoon, Jee-Hee;Kim, Baek-Sop;Park, Sang-Hyun
    • The KIPS Transactions:PartD
    • /
    • v.16D no.5
    • /
    • pp.661-672
    • /
    • 2009
  • Recently it was found that various genetic structural variations such as CNV(copy number variation) exist in the human genome, and these variations are closely related with disease susceptibility, reaction to treatment, and genetic characteristics. In this paper we propose a new CNV detection algorithm using millions of short DNA sequences generated by giga-sequencing technology. Our method maps the DNA sequences onto the reference sequence, and obtains the occurrence frequency of each read in the reference sequence. And then it detects the statistically significant regions which are longer than 1Kbp as the candidate CNV regions by analyzing the distribution of the occurrence frequency. To select a proper read alignment method, several methods are employed in our algorithm, and the performances are compared. To verify the superiority of our approach, we performed extensive experiments. The result of simulation experiments (using a reference sequence, build 35 of NCBI) revealed that our approach successfully finds all the CNV regions that have various shapes and arbitrary length (small, intermediate, or large size).

A CNV Detection Algorithm (CNV 영역 검색 알고리즘)

  • Sang-Kyoon Hong;Dong-Wan Hong;Jee-Hee Yoon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.11a
    • /
    • pp.356-359
    • /
    • 2008
  • 최근 생물정보학 분야에서 인간 유전체에 존재하는 CNV(copy number variation)에 관한 연구가 주목 받고 있다. CNV 영역은 1kbp-3Mbp 사리의 서열이 반복되거나 결실되는 변이 영역으로 정의된다. 우리는 선행연구에서 기가 시퀀싱(giga sequencing)의 결과 산출되는 DNA 서열조각인 리드(read)를 레퍼런스 시퀀스에 서열 정렬하여 CNV 영역을 찾아내는 새로운 CNV 검색 방식을 제안하였다. 후속 연구로서 본 논문에서는 DNA 서열에 존재하는 repeat 영역 문제를 해결하기 위한 새로운 방안을 제안하고, 리드의 출현 빈도 정보를 분석하여 CNV 영역을 찾아내는 CNV 영역 검색 알고리즘을 보인다. 제안된 알고리즘 Gaussian 분포를 갖는 출현 빈도 정보로부터 통계적 유의성을 갖는 영역을 추출하여 CNV 영역후보로 하고, 다음 경제 과정을 거쳐 최종의 CNV 영역을 추출한다. 성능 평가를 위하여 프로토타임 시스템을 개발하였으며, 시뮬레이션 실험을 수행하였다. 실험 결과에 의하여 제안된 방식은 반복되거나 결실되는 형태의 CNV 영역을 효율적으로 검출하며, 또한 다양한 크기의 CNV 영역을 효율적으로 검출할 수 있음을 입증한다.

Whole Genome Resequencing of Heugu (Korean Black Cattle) for the Genome-Wide SNP Discovery

  • Choi, Jung-Woo;Chung, Won-Hyong;Lee, Kyung-Tai;Choi, Jae-Won;Jung, Kyoung-Sub;Cho, Yongmin;Kim, Namshin;Kim, Tae-Hun
    • Food Science of Animal Resources
    • /
    • v.33 no.6
    • /
    • pp.715-722
    • /
    • 2013
  • Heugu (Korea Black Cattle) is one of the indigenous cattle breeds in Korea; however there has been severe lack of genomic studies on the breed. In this study, we report the first whole genome resequencing of Heugu at higher sequence coverage using Illumina HiSeq 2000 platform. More than 153.6 Giga base pairs sequence was obtained, of which 97% of the reads were mapped to the bovine reference sequence assembly (UMD 3.1). The number of non-redundantly mapped sequence reads corresponds to approximately 28.9-fold coverage across the genome. From these data, we identified a total of over six million single nucleotide polymorphisms (SNPs), of which 29.4% were found to be novel using the single nucleotide polymorphism database build 137. Extensive annotation was performed on all the detected SNPs, showing that most of SNPs were located in intergenic regions (70.7%), which is well corresponded with previous studies. Of the total SNPs, we identified substantial numbers of non-synonymous SNPs (13,979) in 5,999 genes, which could potentially affect meat quality traits in cattle. These results provide genome-wide SNPs that can serve as useful genetic tools and as candidates in searches for phenotype-altering DNA difference implicated with meat quality traits in cattle. The importance of this study can be further pronounced with the first whole genome sequencing of the valuable local genetic resource to be used in further genomic comparison studies with diverse cattle breeds.