• Title/Summary/Keyword: long-read sequencing

Search Result 11, Processing Time 0.024 seconds

Storing Digital Information in Long-Read DNA

  • Ahn, TaeJin;Ban, Hamin;Park, Hyunsoo
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.30.1-30.6
    • /
    • 2018
  • There is urgent need for effective and cost-efficient data storage, as the worldwide requirement for data storage is rapidly growing. DNA has introduced a new tool for storing digital information. Recent studies have successfully stored digital information, such as text and gif animation. Previous studies tackled technical hurdles due to errors from DNA synthesis and sequencing. Studies also have focused on a strategy that makes use of 100-150-bp read sizes in both synthesis and sequencing. In this paper, we a suggest novel data encoding/decoding scheme that makes use of long-read DNA (~1,000 bp). This enables accurate recovery of stored digital information with a smaller number of reads than the previous approach. Also, this approach reduces sequencing time.

Analysis of unmapped regions associated with long deletions in Korean whole genome sequences based on short read data

  • Lee, Yuna;Park, Kiejung;Koh, Insong
    • Genomics & Informatics
    • /
    • v.17 no.4
    • /
    • pp.40.1-40.9
    • /
    • 2019
  • While studies aimed at detecting and analyzing indels or single nucleotide polymorphisms within human genomic sequences have been actively conducted, studies on detecting long insertions/deletions are not easy to orchestrate. For the last 10 years, the availability of long read data of human genomes from PacBio or Nanopore platforms has increased, which makes it easier to detect long insertions/deletions. However, because long read data have a critical disadvantage due to their relatively high cost, many next generation sequencing data are produced mainly by short read sequencing machines. Here, we constructed programs to detect so-called unmapped regions (UMRs, where no reads are mapped on the reference genome), scanned 40 Korean genomes to select UMR long deletion candidates, and compared the candidates with the long deletion break points within the genomes available from the 1000 Genomes Project (1KGP). An average of about 36,000 UMRs were found in the 40 Korean genomes tested, 284 UMRs were common across the 40 genomes, and a total of 37,943 UMRs were found. Compared with the 74,045 break points provided by the 1KGP, 30,698 UMRs overlapped. As the number of compared samples increased from 1 to 40, the number of UMRs that overlapped with the break points also increased. This eventually reached a peak of 80.9% of the total UMRs found in this study. As the total number of overlapped UMRs could probably grow to encompass 74,045 break points with the inclusion of more Korean genomes, this approach could be practically useful for studies on long deletions utilizing short read data.

Novel High-Throughput DNA Part Characterization Technique for Synthetic Biology

  • Bak, Seong-Kun;Seong, Wonjae;Rha, Eugene;Lee, Hyewon;Kim, Seong Keun;Kwon, Kil Koang;Kim, Haseong;Lee, Seung-Goo
    • Journal of Microbiology and Biotechnology
    • /
    • v.32 no.8
    • /
    • pp.1026-1033
    • /
    • 2022
  • This study presents a novel DNA part characterization technique that increases throughput by combinatorial DNA part assembly, solid plate-based quantitative fluorescence assay for phenotyping, and barcode tagging-based long-read sequencing for genotyping. We confirmed that the fluorescence intensities of colonies on plates were comparable to fluorescence at the single-cell level from a high-end, flow-cytometry device and developed a high-throughput image analysis pipeline. The barcode tagging-based long-read sequencing technique enabled rapid identification of all DNA parts and their combinations with a single sequencing experiment. Using our techniques, forty-four DNA parts (21 promoters and 23 RBSs) were successfully characterized in 72 h without any automated equipment. We anticipate that this high-throughput and easy-to-use part characterization technique will contribute to increasing part diversity and be useful for building genetic circuits and metabolic pathways in synthetic biology.

Comparison of the Performance of MiSeq and HiSeq 2500 in a Microbiome Study

  • Na, Hee Sam;Yu, Yeuni;Kim, Si Yeong;Lee, Jae-Hyung;Chung, Jin
    • Microbiology and Biotechnology Letters
    • /
    • v.48 no.4
    • /
    • pp.574-581
    • /
    • 2020
  • Next generation sequencing is commonly used to characterize the microbiome structure. MiSeq is commonly used to analyze the microbiome due to its relatively long read length. However, recently, Illumina introduced the 250x2 chip for HiSeq 2500. The purpose of this study was to compare the performance of MiSeq and HiSeq in the context of oral microbiome samples. The MiSeq Reagent Kit V3 and the HiSeq Rapid SBS Kit V2 were used for MiSeq and HiSeq 2500 analyses, respectively. Total read count, read quality score, relative bacterial abundance, community diversity, and relative abundance correlation were analyzed. HiSeq produced significantly more read sequences and assigned taxa compared to MiSeq. Conversely, community diversity was similar in the context of MiSeq and HiSeq. However, depending on the relative abundance, the correlation between the two platforms differed. The correlation between HiSeq and MiSeq sequencing data for highly abundant taxa (> 2%), low abundant taxa (2-0.2%), and rare taxa (0.2% >) was 0.994, 0.860, and 0.416, respectively. Therefore, HiSeq 2500 may also be compatible for microbiome studies. Importantly, the HiSeq platform may allow a high-resolution massive parallel sequencing for the detection of rare taxa.

A Comparative Analysis of the Illumina Truseq Synthetic Long-read Haplotyping Sequencing Platform versus the 10X Genomics Chromium Genome Sequencing Platform for Haplotype Phasing and the Identification of Single-nucleotide variants (SNVs) in Hanwoo (Korean Native Cattle) (일루미나에서 제작된 TSLRH (Truseq Synthetic Long-Read Haplotyping)와 10X Genomics에서 제작된 The Chromium Genome 시퀀싱 플랫폼을 이용하여 생산된 한우(한국 재래 소)의 반수체형 페이징 및 단일염기서열변이 비교 분석)

  • Park, Woncheoul;Srikanth, Krishnamoorthy;Park, Jong-Eun;Shin, Donghyun;Ko, Haesu;Lim, Dajeong;Cho, In-Cheol
    • Journal of Life Science
    • /
    • v.29 no.1
    • /
    • pp.1-8
    • /
    • 2019
  • In Hanwoo cattle (Korean native cattle), there is a scarcity of comparative analysis papers using highdepth sequencing and haplotype phasing, particularly a comparative analysis of the Truseq Synthetic Long-Read Haplotyping sequencing platform serviced by Illumina (TSLRH) versus the Chromium Genome Sequencing platform serviced by 10X Genomics (10XG). DNA was extracted from the sperm of a Hanwoo breeding bull (ID: TN1505D2184/27214) provided by Hanwoo research canter and used for the generation of sequence data from both the sequencing platforms. We then identified SNVs using an appropriate analysis pipeline tailored for each platform. The TSLRH and 10XG platforms generated a total of 355,208,304 and 1,632,772,004 reads, respectively, corresponding to a Q30 (%) of 89.04% and 88.60%, respectively, of which 351,992,768(99.09%) and 1,526,641,824(93.50%) were successfully mapped. For the TSLRH and 10XG platforms, the mean depth of the sequencing was 13.04X and 74.3X, the longest phase block was 1,982,706 bp and 1,480,081 bp, the N50 phase block was 57,637 bp and 114,394 bp, the total number of SNVs identified was 4,534,989 and 8,496,813, and the total phased rate was 72.29% and 87.67%, respectively. Moreover, for each chromosome, we identified unique and common SNVs using both sequencing platforms. The number of SNVs was directly proportional to the length of the chromosome. Based on our results, we recommend the use of the 10XG platform for haplotype phasing and SNV identification, as it generated a longer N50 phase block, in addition to a higher mean depth, total number of reads, total number of SNVs, and phase rate, than the TSLRH platform.

Toward Complete Bacterial Genome Sequencing Through the Combined Use of Multiple Next-Generation Sequencing Platforms

  • Jeong, Haeyoung;Lee, Dae-Hee;Ryu, Choong-Min;Park, Seung-Hwan
    • Journal of Microbiology and Biotechnology
    • /
    • v.26 no.1
    • /
    • pp.207-212
    • /
    • 2016
  • PacBio's long-read sequencing technologies can be successfully used for a complete bacterial genome assembly using recently developed non-hybrid assemblers in the absence of second-generation, high-quality short reads. However, standardized procedures that take into account multiple pre-existing second-generation sequencing platforms are scarce. In addition to Illumina HiSeq and Ion Torrent PGM-based genome sequencing results derived from previous studies, we generated further sequencing data, including from the PacBio RS II platform, and applied various bioinformatics tools to obtain complete genome assemblies for five bacterial strains. Our approach revealed that the hierarchical genome assembly process (HGAP) non-hybrid assembler resulted in nearly complete assemblies at a moderate coverage of ~75x, but that different versions produced non-compatible results requiring post processing. The other two platforms further improved the PacBio assembly through scaffolding and a final error correction.

Exome and genome sequencing for diagnosing patients with suspected rare genetic disease

  • Go Hun Seo;Hane Lee
    • Journal of Genetic Medicine
    • /
    • v.20 no.2
    • /
    • pp.31-38
    • /
    • 2023
  • Rare diseases, even though defined as fewer than 20,000 in South Korea, with over 8,000 rare Mendelian disorders having been identified, they collectively impact 6-8% of the global population. Many of the rare diseases pose significant challenges to patients, patients' families, and the healthcare system. The diagnostic journey for rare disease patients is often lengthy and arduous, hampered by the genetic diversity and phenotypic complexity of these conditions. With the advent of next-generation sequencing technology and clinical implementation of exome sequencing (ES) and genome sequencing (GS), the diagnostic rate for rare diseases is 25-50% depending on the disease category. It is also allowing more rapid new gene-disease association discovery and equipping us to practice precision medicine by offering tailored medical management plans, early intervention, family planning options. However, a substantial number of patients remain undiagnosed, and it could be due to several factors. Some may not have genetic disorders. Some may have disease-causing variants that are not detectable or interpretable by ES and GS. It's also possible that some patient might have a disease-causing variant in a gene that hasn't yet been linked to a disease. For patients who remain undiagnosed, reanalysis of existing data has shown promises in providing new molecular diagnoses achieved by new gene-disease associations, new variant discovery, and variant reclassification, leading to a 5-10% increase in the diagnostic rate. More advanced approach such as long-read sequencing, transcriptome sequencing and integration of multi-omics data may provide potential values in uncovering elusive genetic causes.

Birth of an 'Asian cool' reference genome: AK1

  • Kim, Changhoon
    • BMB Reports
    • /
    • v.49 no.12
    • /
    • pp.653-654
    • /
    • 2016
  • The human reference genome, maintained by the Genome Reference Consortium, is conceivably the most complete genome assembly ever, since its first construction. It has continually been improved by incorporating corrections made to the previous assemblies, thanks to various technological advances. Many currently-ongoing population sequencing projects have been based on this reference genome, heightening hopes of the development of useful medical applications of genomic information, thanks to the recent maturation of high-throughput sequencing technologies. However, just one reference genome does not fit all the populations across the globe, because of the large diversity in genomic structures and technical limitations inherent to short read sequencing methods. The recent success in de novo construction of the highly contiguous Asian diploid genome AK1, by combining single molecule technologies with routine sequencing data without resorting to traditional clone-by-clone sequencing and physical mapping, reveals the nature of genomic structure variation by detecting thousands of novel structural variations and by finally filling in some of the prior gaps which had persistently remained in the current human reference genome. Now it is expected that the AK1 genome, soon to be paired with more upcoming de novo assembled genomes, will provide a chance to explore what it is really like to use ancestry-specific reference genomes instead of hg19/hg38 for population genomics. This is a major step towards the furthering of genetically-based precision medicine.

Microbial Community Dysbiosis and Functional Gene Content Changes in Apple Flowers due to Fire Blight

  • Kong, Hyun Gi;Ham, Hyeonheui;Lee, Mi-Hyun;Park, Dong Suk;Lee, Yong Hwan
    • The Plant Pathology Journal
    • /
    • v.37 no.4
    • /
    • pp.404-412
    • /
    • 2021
  • Despite the plant microbiota plays an important role in plant health, little is known about the potential interactions of the flower microbiota with pathogens. In this study, we investigated the microbial community of apple blossoms when infected with Erwinia amylovora. The long-read sequencing technology, which significantly increased the genome sequence resolution, thus enabling the characterization of fire blight-induced changes in the flower microbial community. Each sample showed a unique microbial community at the species level. Pantoea agglomerans and P. allii were the most predominant bacteria in healthy flowers, whereas E. amylovora comprised more than 90% of the microbial population in diseased flowers. Furthermore, gene function analysis revealed that glucose and xylose metabolism were enriched in diseased flowers. Overall, our results showed that the microbiome of apple blossoms is rich in specific bacteria, and the nutritional composition of flowers is important for the incidence and spread of bacterial disease.

Ongoing endeavors to detect mobilization of transposable elements

  • Lee, Yujeong;Ha, Una;Moon, Sungjin
    • BMB Reports
    • /
    • v.55 no.7
    • /
    • pp.305-315
    • /
    • 2022
  • Transposable elements (TEs) are DNA sequences capable of mobilization from one location to another in the genome. Since the discovery of 'Dissociation (Dc) locus' by Barbara McClintock in maize (1), mounting evidence in the era of genomics indicates that a significant fraction of most eukaryotic genomes is composed of TE sequences, involving in various aspects of biological processes such as development, physiology, diseases and evolution. Although technical advances in genomics have discovered numerous functional impacts of TE across species, our understanding of TEs is still ongoing process due to challenges resulted from complexity and abundance of TEs in the genome. In this mini-review, we briefly summarize biology of TEs and their impacts on the host genome, emphasizing importance of understanding TE landscape in the genome. Then, we introduce recent endeavors especially in vivo retrotransposition assays and long read sequencing technology for identifying de novo insertions/TE polymorphism, which will broaden our knowledge of extraordinary relationship between genomic cohabitants and their host.