• Title/Summary/Keyword: RNA-sequencing

Search Result 1,205, Processing Time 0.024 seconds

Analysis of Whole Transcriptome Sequencing Data: Workflow and Software

  • Yang, In Seok;Kim, Sangwoo
    • Genomics & Informatics
    • /
    • v.13 no.4
    • /
    • pp.119-125
    • /
    • 2015
  • RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification.

Variational Autoencoder Based Dimension Reduction and Clustering for Single-Cell RNA-seq Gene Expression (단일세포 RNA-SEQ의 유전자 발현 군집화를 위한 변이 자동인코더 기반의 차원감소와 군집화)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1512-1518
    • /
    • 2021
  • Since single cell RNA sequencing provides the expression profiles of individual cells, it provides higher cellular differential resolution than traditional bulk RNA sequencing. Using these single cell RNA sequencing data, clustering analysis is generally conducted to find cell types and understand high level biological processes. In order to effectively process the high-dimensional single cell RNA sequencing data fir the clustering analysis, this paper uses a variational autoencoder to transform a high dimensional data space into a lower dimensional latent space, expecting to produce a latent space that can give more accurate clustering results. By clustering the features in the transformed latent space, we compare the performance of various classical clustering methods for single cell RNA sequencing data. Experimental results demonstrate that the proposed framework outperforms many state-of-the-art methods under various clustering performance metrics.

Unraveling flavivirus pathogenesis: from bulk to single-cell RNA-sequencing strategies

  • Doyeong Kim;Seonghun Jeong;Sang-Min Park
    • The Korean Journal of Physiology and Pharmacology
    • /
    • v.28 no.5
    • /
    • pp.403-411
    • /
    • 2024
  • The global spread of flaviviruses has triggered major outbreaks worldwide, significantly impacting public health, society, and economies. This has intensified research efforts to understand how flaviviruses interact with their hosts and manipulate the immune system, underscoring the need for advanced research tools. RNA-sequencing (RNA-seq) technologies have revolutionized our understanding of flavivirus infections by offering transcriptome analysis to dissect the intricate dynamics of virus-host interactions. Bulk RNA-seq provides a macroscopic overview of gene expression changes in virus-infected cells, offering insights into infection mechanisms and host responses at the molecular level. Single-cell RNA sequencing (scRNA-seq) provides unprecedented resolution by analyzing individual infected cells, revealing remarkable cellular heterogeneity within the host response. A particularly innovative advancement, virus-inclusive single-cell RNA sequencing (viscRNA-seq), addresses the challenges posed by non-polyadenylated flavivirus genomes, unveiling intricate details of virus-host interactions. In this review, we discuss the contributions of bulk RNA-seq, scRNA-seq, and viscRNA-seq to the field, exploring their implications in cell line experiments and studies on patients infected with various flavivirus species. Comprehensive transcriptome analyses from RNA-seq technologies are pivotal in accelerating the development of effective diagnostics and therapeutics, paving the way for innovative treatments and enhancing our preparedness for future outbreaks.

A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning (앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

COEX-Seq: Convert a Variety of Measurements of Gene Expression in RNA-Seq

  • Kim, Sang Cheol;Yu, Donghyeon;Cho, Seong Beom
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.36.1-36.3
    • /
    • 2018
  • Next generation sequencing (NGS), a high-throughput DNA sequencing technology, is widely used for molecular biological studies. In NGS, RNA-sequencing (RNA-Seq), which is a short-read massively parallel sequencing, is a major quantitative transcriptome tool for different transcriptome studies. To utilize the RNA-Seq data, various quantification and analysis methods have been developed to solve specific research goals, including identification of differentially expressed genes and detection of novel transcripts. Because of the accumulation of RNA-Seq data in the public databases, there is a demand for integrative analysis. However, the available RNA-Seq data are stored in different formats such as read count, transcripts per million, and fragments per kilobase million. This hinders the integrative analysis of the RNA-Seq data. To solve this problem, we have developed a web-based application using Shiny, COEX-seq (Convert a Variety of Measurements of Gene Expression in RNA-Seq) that easily converts data in a variety of measurement formats of gene expression used in most bioinformatic tools for RNA-Seq. It provides a workflow that includes loading data set, selecting measurement formats of gene expression, and identifying gene names. COEX-seq is freely available for academic purposes and can be run on Windows, Mac OS, and Linux operating systems. Source code, sample data sets, and supplementary documentation are available as well.

Development of an RNA sequencing panel to detect gene fusions in thyroid cancer

  • Kim, Dongmoung;Jung, Seung-Hyun;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • v.19 no.4
    • /
    • pp.41.1-41.10
    • /
    • 2021
  • In addition to mutations and copy number alterations, gene fusions are commonly identified in cancers. In thyroid cancer, fusions of important cancer-related genes have been commonly reported; however, extant panels do not cover all clinically important gene fusions. In this study, we aimed to develop a custom RNA-based sequencing panel to identify the key fusions in thyroid cancer. Our ThyChase panel was designed to detect 87 types of gene fusion. As quality control of RNA sequencing, five housekeeping genes were included in this panel. When we applied this panel for the analysis of fusions containing reference RNA (HD796), three expected fusions (EML4-ALK, CCDC6-RET, and TPM3-NTRK1) were successfully identified. We confirmed the fusion breakpoint sequences of the three fusions from HD796 by Sanger sequencing. Regarding the limit of detection, this panel could detect the target fusions from a tumor sample containing a 1% fusion-positive tumor cellular fraction. Taken together, our ThyChase panel would be useful to identify gene fusions in the clinical field.

Transcriptomic Analysis of Cellular Senescence: One Step Closer to Senescence Atlas

  • Kim, Sohee;Kim, Chuna
    • Molecules and Cells
    • /
    • v.44 no.3
    • /
    • pp.136-145
    • /
    • 2021
  • Senescent cells that gradually accumulate during aging are one of the leading causes of aging. While senolytics can improve aging in humans as well as mice by specifically eliminating senescent cells, the effect of the senolytics varies in different cell types, suggesting variations in senescence. Various factors can induce cellular senescence, and the rate of accumulation of senescent cells differ depending on the organ. In addition, since the heterogeneity is due to the spatiotemporal context of senescent cells, in vivo studies are needed to increase the understanding of senescent cells. Since current methods are often unable to distinguish senescent cells from other cells, efforts are being made to find markers commonly expressed in senescent cells using bulk RNA-sequencing. Moreover, single-cell RNA (scRNA) sequencing, which analyzes the transcripts of each cell, has been utilized to understand the in vivo characteristics of the rare senescent cells. Recently, transcriptomic cell atlases for each organ using this technology have been published in various species. Novel senescent cells that do not express previously established marker genes have been discovered in some organs. However, there is still insufficient information on senescent cells due to the limited throughput of the scRNA sequencing technology. Therefore, it is necessary to improve the throughput of the scRNA sequencing technology or develop a way to enrich the rare senescent cells. The in vivo senescent cell atlas that is established using rapidly developing single-cell technologies will contribute to the precise rejuvenation by specifically removing senescent cells in each tissue and individual.

Type-specific Amplification of 5S rRNA from Panax ginseng Cultivars Using Touchdown (TD) PCR and Direct Sequencing

  • Sun, Hun;Wang, Hong-Tao;Kwon, Woo-Saeng;Kim, Yeon-Ju;Yang, Deok-Chun
    • Journal of Ginseng Research
    • /
    • v.33 no.1
    • /
    • pp.55-58
    • /
    • 2009
  • Generally, the direct sequencing through PCR is faster, easier, cheaper, and more practical than clone sequencing. Frequently, standard PCR amplification is usually interpreted by mispriming internal or external regions of the target template. Normally, DNA fragments were eluted from the gel using Gel extraction kit and subjected to direct sequencing or cloning sequencing. Cloning sequencing has often troublesome and needs more time to analyze for many samples. Since touchdown (TD) PCR can generate sufficient and highly specific amplification, it reduces unwanted amplicon generation. Accordingly, TD PCR is a good method for direct sequencing due to amplifying wanted fragment. In plants the 5S-rRNA gene is separated by simple spacers. The 5S-rRNA gene sequence is very well-conserved between plant species while the spacer is species-specific. Therefore, the sequence has been used for phylogenetic studies and species identification. But frequent occurrences of spurious bands caused by complex genomes are encountered in the product spectrum of standard PCR amplification. In conclusion, the TD PCR method can be applied easily to amplify main 5S-rRNA and direct sequencing of panax ginseng cultivars.

Integrative Comparison of Burrows-Wheeler Transform-Based Mapping Algorithm with de Bruijn Graph for Identification of Lung/Liver Cancer-Specific Gene

  • Ajaykumar, Atul;Yang, Jung Jin
    • Journal of Microbiology and Biotechnology
    • /
    • v.32 no.2
    • /
    • pp.149-159
    • /
    • 2022
  • Cancers of the lung and liver are the top 10 leading causes of cancer death worldwide. Thus, it is essential to identify the genes specifically expressed in these two cancer types to develop new therapeutics. Although many messenger RNA (mRNA) sequencing data related to these cancer cells are available due to the advancement of next-generation sequencing (NGS) technologies, optimized data processing methods need to be developed to identify the novel cancer-specific genes. Here, we conducted an analytical comparison between Bowtie2, a Burrows-Wheeler transform-based alignment tool, and Kallisto, which adopts pseudo alignment based on a transcriptome de Bruijn graph using mRNA sequencing data on normal cells and lung/liver cancer tissues. Before using cancer data, simulated mRNA sequencing reads were generated, and the high Transcripts Per Million (TPM) values were compared. mRNA sequencing reads data on lung/liver cancer cells were also extracted and quantified. While Kallisto could directly give the output in TPM values, Bowtie2 provided the counts. Thus, TPM values were calculated by processing the Sequence Alignment Map (SAM) file in R using package Rsubread and subsequently in python. The analysis of the simulated sequencing data revealed that Kallisto could detect more transcripts and had a higher overlap over Bowtie2. The evaluation of these two data processing methods using the known lung cancer biomarkers concludes that in standard settings without any dedicated quality control, Kallisto is more effective at producing faster and more accurate results than Bowtie2. Such conclusions were also drawn and confirmed with the known biomarkers specific to liver cancer.

One-step spectral clustering of weighted variables on single-cell RNA-sequencing data (단세포 RNA 시퀀싱 데이터를 위한 가중변수 스펙트럼 군집화 기법)

  • Park, Min Young;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.511-526
    • /
    • 2020
  • Single-cell RNA-sequencing (scRNA-seq) data consists of each cell's RNA expression extracted from large populations of cells. One main purpose of using scRNA-seq data is to identify inter-cellular heterogeneity. However, scRNA-seq data pose statistical challenges when applying traditional clustering methods because they have many missing values and high level of noise due to technical and sampling issues. In this paper, motivated by analyzing scRNA-seq data, we propose a novel spectral-based clustering method by imposing different weights on genes when computing a similarity between cells. Assigning weights on genes and clustering cells are performed simultaneously in the proposed clustering framework. We solve the proposed non-convex optimization using an iterative algorithm. Both real data application and simulation study suggest that the proposed clustering method better identifies underlying clusters compared with existing clustering methods.