• 제목/요약/키워드: whole-genome analysis

검색결과 337건 처리시간 0.018초

Whole-genome sequence analysis through online web interfaces: a review

  • Gunasekara, A.W.A.C.W.R.;Rajapaksha, L.G.T.G.;Tung, T.L.
    • Genomics & Informatics
    • /
    • 제20권1호
    • /
    • pp.3.1-3.10
    • /
    • 2022
  • The recent development of whole-genome sequencing technologies paved the way for understanding the genomes of microorganisms. Every whole-genome sequencing (WGS) project requires a considerable cost and a massive effort to address the questions at hand. The final step of WGS is data analysis. The analysis of whole-genome sequence is dependent on highly sophisticated bioinformatics tools that the research personal have to buy. However, many laboratories and research institutions do not have the bioinformatics capabilities to analyze the genomic data and therefore, are unable to take maximum advantage of whole-genome sequencing. In this aspect, this study provides a guide for research personals on a set of bioinformatics tools available online that can be used to analyze whole-genome sequence data of bacterial genomes. The web interfaces described here have many advantages and, in most cases exempting the need for costly analysis tools and intensive computing resources.

전유전체(Whole gerlome) 서열 분석과 가시화를 위한 워크벤치 개발 (Development of Workbench for Analysis and Visualization of Whole Genome Sequence)

  • 최정현;진희정;김철민;장철훈;조환규
    • 정보처리학회논문지A
    • /
    • 제9A권3호
    • /
    • pp.387-398
    • /
    • 2002
  • 최근 활발한 소단위 게놈 프로젝트의 수행으로 많은 생물체의 유전체 전체 서열이 밝혀짐에 따라서 전유전체(whole genome)를 기본 단위로 하여 개별 유전자나 그에 관련된 기능 연구가 매우 활발히 이루어지고 있다. 전유전체의 염기 서열은 수백만 bp(base pairs)에서 수백억 bp(base pairs) 정도의 대용량 텍스트 데이터이기 때문에 단순한 온라인 문자 일치(on-line string matching) 알고리즘으로 분석하는 것은 매우 비효율적이다. 본 논문에서는 대용량의 유전체 서열을 분석하는데 적합한 자료 구조인 스트링 B-트리를 사용하여 유전체 서열의 분석과 가시화를 위한 워크벤치를 개발한 과정을 소개한다. 본 연구에서 개발한 시스템은 크게 질의문 부분과 가시화 부분으로 나뉘어 진다. 질의문 부분에는 유전체 서열에 특정 서열이 나타나는 부분의 위치와 횟수를 알아보거나 k번 나타나는 서열을 조사하는 것과 같은 기본적인 패턴 검색 부분과 k-mer 분석을 위한 질의어가 다양하게 준비되어 있다. 가시화 부분은 전유전체 서열과 주석(annotation)을 보여주거나, 유전체 분석을 용이하도록 여러 가시화 방법, CGR(Chaos Game Representation), k-mer graph, RWP(Random Walk Plot) 등으로 생물학자들이 쉽게 전체 구조와 특성 파악할 수 있도록 도와준다. 본 논문이 제안하는 분석 시스템은 생물체의 진화적 관계를 밝히고, 염색체 내에 아직 알려지지 않은 새로운 유전자나 기능이 밝혀지지 않은 junk DNA들의 기능 등을 연구하는데 사용할 수 있다.

Generation and analysis of whole-genome sequencing data in human mammary epithelial cells

  • Jong-Lyul Park;Jae-Yoon Kim;Seon-Young Kim;Yong Sun Lee
    • Genomics & Informatics
    • /
    • 제21권1호
    • /
    • pp.11.1-11.5
    • /
    • 2023
  • Breast cancer is the most common cancer worldwide, and advanced breast cancer with metastases is incurable mainly with currently available therapies. Therefore, it is essential to understand molecular characteristics during the progression of breast carcinogenesis. Here, we report a dataset of whole genomes from the human mammary epithelial cell system derived from a reduction mammoplasty specimen. This system comprises pre-stasis 184D cells, considered normal, and seven cell lines along cancer progression series that are immortalized or additionally acquired anchorage-independent growth. Our analysis of the whole-genome sequencing (WGS) data indicates that those seven cancer progression series cells have somatic mutations whose number ranges from 8,393 to 39,564 (with an average of 30,591) compared to 184D cells. These WGS data and our mutation analysis will provide helpful information to identify driver mutations and elucidate molecular mechanisms for breast carcinogenesis.

Generation of Whole-Genome Sequencing Data for Comparing Primary and Castration-Resistant Prostate Cancer

  • Park, Jong-Lyul;Kim, Seon-Kyu;Kim, Jeong-Hwan;Yun, Seok Joong;Kim, Wun-Jae;Kim, Won Tae;Jeong, Pildu;Kang, Ho Won;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • 제16권3호
    • /
    • pp.71-74
    • /
    • 2018
  • Because castration-resistant prostate cancer (CRPC) does not respond to androgen deprivation therapy and has a very poor prognosis, it is critical to identify a prognostic indicator for predicting high-risk patients who will develop CRPC. Here, we report a dataset of whole genomes from four pairs of primary prostate cancer (PC) and CRPC samples. The analysis of the paired PC and CRPC samples in the whole-genome data showed that the average number of somatic mutations per patients was 7,927 in CRPC tissues compared with primary PC tissues (range, 1,691 to 21,705). Our whole-genome sequencing data of primary PC and CRPC may be useful for understanding the genomic changes and molecular mechanisms that occur during the progression from PC to CRPC.

Whole genome sequence analyses of thermotolerant Bacillus sp. isolates from food

  • Phornphan Sornchuer;Kritsakorn Saninjuk;Pholawat Tingpej
    • Genomics & Informatics
    • /
    • 제21권3호
    • /
    • pp.35.1-35.12
    • /
    • 2023
  • The Bacillus cereus group, also known as B. cereus sensu lato (B. cereus s.l.), is composed of various Bacillus species, some of which can cause diarrheal or emetic food poisoning. Several emerging highly heat-resistant Bacillus species have been identified, these include B. thermoamylovorans, B. sporothermodurans, and B. cytotoxicus NVH 391-98. Herein, we performed whole genome analysis of two thermotolerant Bacillus sp. isolates, Bacillus sp. B48 and Bacillus sp. B140, from an omelet with acacia leaves and fried rice, respectively. Phylogenomic analysis suggested that Bacillus sp. B48 and Bacillus sp. B140 are closely related to B. cereus and B. thuringiensis, respectively. Whole genome alignment of Bacillus sp. B48, Bacillus sp. B140, mesophilic strain B. cereus ATCC14579, and thermophilic strain B. cytotoxicus NVH 391-98 using the Mauve program revealed the presence of numerous homologous regions including genes responsible for heat shock in the dnaK gene cluster. However, the presence of a DUF4253 domain-containing protein was observed only in the genome of B. cereus ATCC14579 while the intracellular protease PfpI family was present only in the chromosome of B. cytotoxicus NVH 391-98. In addition, prophage Clp protease-like proteins were found in the genomes of both Bacillus sp. B48 and Bacillus sp. B140 but not in the genome of B. cereus ATCC14579. The genomic profiles of Bacillus sp. isolates were identified by using whole genome analysis especially those relating to heat-responsive gene clusters. The findings presented in this study lay the foundations for subsequent studies to reveal further insights into the molecular mechanisms of Bacillus species in terms of heat resistance mechanisms.

No excessive mutations in transcription activator-like effector nuclease-mediated α-1,3-galactosyltransferase knockout Yucatan miniature pigs

  • Choi, Kimyung;Shim, Joohyun;Ko, Nayoung;Park, Joonghoon
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제33권2호
    • /
    • pp.360-372
    • /
    • 2020
  • Objective: Specific genomic sites can be recognized and permanently modified by genome editing. The discovery of endonucleases has advanced genome editing in pigs, attenuating xenograft rejection and cross-species disease transmission. However, off-target mutagenesis caused by these nucleases is a major barrier to putative clinical applications. Furthermore, off-target mutagenesis by genome editing has not yet been addressed in pigs. Methods: Here, we generated genetically inheritable α-1,3-galactosyltransferase (GGTA1) knockout Yucatan miniature pigs by combining transcription activator-like effector nuclease (TALEN) and nuclear transfer. For precise estimation of genomic mutations induced by TALEN in GGTA1 knockout pigs, we obtained the whole-genome sequence of the donor cells for use as an internal control genome. Results: In-depth whole-genome sequencing analysis demonstrated that TALEN-mediated GGTA1 knockout pigs had a comparable mutation rate to homologous recombination-treated pigs and wild-type strain controls. RNA sequencing analysis associated with genomic mutations revealed that TALEN-induced off-target mutations had no discernable effect on RNA transcript abundance. Conclusion: Therefore, TALEN appears to be a precise and safe tool for generating genomeedited pigs, and the TALEN-mediated GGTA1 knockout Yucatan miniature pigs produced in this study can serve as a safe and effective organ and tissue resource for clinical applications.

Genome analysis of Bacteroides sp. CACC 737 isolated from feline for its potential application

  • Kim, Jung-Ae;Jung, Min Young;Kim, Dae-Hyuk;Kim, Yangseon
    • Journal of Animal Science and Technology
    • /
    • 제62권6호
    • /
    • pp.952-955
    • /
    • 2020
  • Bacteroides sp. CACC 737 was isolated from a feline, and its potential probiotic properties were characterized using functional genome analysis. Whole-genome sequencing was performed using the PacBio RSII and Illumina HiSeq platforms. The complete genome of strain CACC 737 contained 4.6 Mb, with a guanine (G) + cytosine (C) content of 45.8%, six cryptic plasmids, and extracellular polysaccharide gene as unique features. The strain was beneficial to animal health when consumed as feed, for example, for ameliorating immunological dysfunctions and metabolic disorders. The genome information adds to the comprehensive understanding of Bacteroides sp. and suggests potential animal-related industrial applications for this strain.

Draft genome of Semisulcospira libertina, a species of freshwater snail

  • Gim, Jeong-An;Baek, Kyung-Wan;Hah, Young-Sool;Choo, Ho Jin;Kim, Ji-Seok;Yoo, Jun-Il
    • Genomics & Informatics
    • /
    • 제19권3호
    • /
    • pp.32.1-32.10
    • /
    • 2021
  • Semisulcospira libertina, a species of freshwater snail, is widespread in East Asia. It is important as a food source. Additionally, it is a vector of clonorchiasis, paragonimiasis, metagonimiasis, and other parasites. Although S. libertina has ecological, commercial, and clinical importance, its whole-genome has not been reported yet. Here, we revealed the genome of S. libertina through de novo assembly. We assembled the whole-genome of S. libertina and determined its transcriptome for the first time using Illumina NovaSeq 6000 platform. According to the k-mer analysis, the genome size of S. libertina was estimated to be 3.04 Gb. Using RepeatMasker, a total of 53.68% of repeats were identified in the genome assembly. Genome data of S. libertina reported in this study will be useful for identification and conservation of S. libertina in East Asia.

Whole-Genome Analysis of CC224 Listeria monocytogenes Strain IJPL9-1, Clonally Related to the Listeriosis Outbreak Strain in 2018, Isolated from Pork in Korea

  • Mi Ru Lee;Kun Taek Park
    • 한국미생물·생명공학회지
    • /
    • 제52권3호
    • /
    • pp.328-330
    • /
    • 2024
  • Listeriosis is one of serious foodborne disease caused mainly by consumption of food contaminated with Listeria monocytogenes. In this study, we isolated L. monocytogenes strain IJPL9-1 from pork in Korea and conducted whole-genome sequencing (WGS). WGS data revealed a single chromosome of 2,913,085 bp. The strain was identified as sequence type (ST) 224, clonal complex (CC) 224, lineage I, and sub-lineage (SL) 6178 based on multilocus sequence typing (MLST) and core genome MLST (cgMLST). The average nucleotide identity was 95.15% with the reference genome EGD-e and 99.99% with FSCNU_000110, the outbreak strain in Korea in 2018. The serogroup was determined to be IIb, and the presence of antimicrobial resistance genes fosX, vga(G), mprF, norB, and sul was determined.

Whole genome sequencing of foot-and-mouth disease virus using benchtop next generation sequencing (NGS) system

  • Moon, Sung-Hyun;Oh, Yeonsu;Tark, Dongseob;Cho, Ho-Seong
    • 한국동물위생학회지
    • /
    • 제42권4호
    • /
    • pp.297-300
    • /
    • 2019
  • In countries with FMD vaccination, as in Korea, typical clinical signs do not appear, and even in FMD positive cases, it is difficult to isolate the FMDV or obtain whole genome sequence. To overcome this problem, more rapid and simple NGS system is required to control FMD in Korea. FMDV (O/Boeun/ SKR/2017) RNA was extracted and sequenced using Ion Torrent's bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. The whole genome sequencing of raw data generated data of 1,839,864 (mean read length 283 bp) reads comprising a total of 521,641,058 (≥Q20 475,327,721). Compared with FMDV (GenBank accession No. MG983730), the FMDV sequences in this study showed 99.83% nucleotide identity. Further study is needed to identify these differences. In this study, fast and robust methods for benchtop next generation sequencing (NGS) system was developed for analysis of Foot-and-mouth disease virus (FMDV) whole genome sequences.