• Title/Summary/Keyword: RNA sequencing (RNA-seq)

Search Result 155, Processing Time 0.024 seconds

COEX-Seq: Convert a Variety of Measurements of Gene Expression in RNA-Seq

  • Kim, Sang Cheol;Yu, Donghyeon;Cho, Seong Beom
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.36.1-36.3
    • /
    • 2018
  • Next generation sequencing (NGS), a high-throughput DNA sequencing technology, is widely used for molecular biological studies. In NGS, RNA-sequencing (RNA-Seq), which is a short-read massively parallel sequencing, is a major quantitative transcriptome tool for different transcriptome studies. To utilize the RNA-Seq data, various quantification and analysis methods have been developed to solve specific research goals, including identification of differentially expressed genes and detection of novel transcripts. Because of the accumulation of RNA-Seq data in the public databases, there is a demand for integrative analysis. However, the available RNA-Seq data are stored in different formats such as read count, transcripts per million, and fragments per kilobase million. This hinders the integrative analysis of the RNA-Seq data. To solve this problem, we have developed a web-based application using Shiny, COEX-seq (Convert a Variety of Measurements of Gene Expression in RNA-Seq) that easily converts data in a variety of measurement formats of gene expression used in most bioinformatic tools for RNA-Seq. It provides a workflow that includes loading data set, selecting measurement formats of gene expression, and identifying gene names. COEX-seq is freely available for academic purposes and can be run on Windows, Mac OS, and Linux operating systems. Source code, sample data sets, and supplementary documentation are available as well.

Big Data Analytics in RNA-sequencing (RNA 시퀀싱 기법으로 생성된 빅데이터 분석)

  • Sung-Hun WOO;Byung Chul JUNG
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.55 no.4
    • /
    • pp.235-243
    • /
    • 2023
  • As next-generation sequencing has been developed and used widely, RNA-sequencing (RNA-seq) has rapidly emerged as the first choice of tools to validate global transcriptome profiling. With the significant advances in RNA-seq, various types of RNA-seq have evolved in conjunction with the progress in bioinformatic tools. On the other hand, it is difficult to interpret the complex data underlying the biological meaning without a general understanding of the types of RNA-seq and bioinformatic approaches. In this regard, this paper discusses the two main sections of RNA-seq. First, two major variants of RNA-seq are described and compared with the standard RNA-seq. This provides insights into which RNA-seq method is most appropriate for their research. Second, the most widely used RNA-seq data analyses are discussed: (1) exploratory data analysis and (2) pathway enrichment analysis. This paper introduces the most widely used exploratory data analysis for RNA-seq, such as principal component analysis, heatmap, and volcano plot, which can provide the overall trends in the dataset. The pathway enrichment analysis section introduces three generations of pathway enrichment analysis and how they generate enriched pathways with the RNA-seq dataset.

Analysis of Whole Transcriptome Sequencing Data: Workflow and Software

  • Yang, In Seok;Kim, Sangwoo
    • Genomics & Informatics
    • /
    • v.13 no.4
    • /
    • pp.119-125
    • /
    • 2015
  • RNA is a polymeric molecule implicated in various biological processes, such as the coding, decoding, regulation, and expression of genes. Numerous studies have examined RNA features using whole transcriptome sequencing (RNA-seq) approaches. RNA-seq is a powerful technique for characterizing and quantifying the transcriptome and accelerates the development of bioinformatics software. In this review, we introduce routine RNA-seq workflow together with related software, focusing particularly on transcriptome reconstruction and expression quantification.

Variational Autoencoder Based Dimension Reduction and Clustering for Single-Cell RNA-seq Gene Expression (단일세포 RNA-SEQ의 유전자 발현 군집화를 위한 변이 자동인코더 기반의 차원감소와 군집화)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1512-1518
    • /
    • 2021
  • Since single cell RNA sequencing provides the expression profiles of individual cells, it provides higher cellular differential resolution than traditional bulk RNA sequencing. Using these single cell RNA sequencing data, clustering analysis is generally conducted to find cell types and understand high level biological processes. In order to effectively process the high-dimensional single cell RNA sequencing data fir the clustering analysis, this paper uses a variational autoencoder to transform a high dimensional data space into a lower dimensional latent space, expecting to produce a latent space that can give more accurate clustering results. By clustering the features in the transformed latent space, we compare the performance of various classical clustering methods for single cell RNA sequencing data. Experimental results demonstrate that the proposed framework outperforms many state-of-the-art methods under various clustering performance metrics.

Dimensionality Reduction of RNA-Seq Data

  • Al-Turaiki, Isra
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.31-36
    • /
    • 2021
  • RNA sequencing (RNA-Seq) is a technology that facilitates transcriptome analysis using next-generation sequencing (NSG) tools. Information on the quantity and sequences of RNA is vital to relate our genomes to functional protein expression. RNA-Seq data are characterized as being high-dimensional in that the number of variables (i.e., transcripts) far exceeds the number of observations (e.g., experiments). Given the wide range of dimensionality reduction techniques, it is not clear which is best for RNA-Seq data analysis. In this paper, we study the effect of three dimensionality reduction techniques to improve the classification of the RNA-Seq dataset. In particular, we use PCA, SVD, and SOM to obtain a reduced feature space. We built nine classification models for a cancer dataset and compared their performance. Our experimental results indicate that better classification performance is obtained with PCA and SOM. Overall, the combinations PCA+KNN, SOM+RF, and SOM+KNN produce preferred results.

Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods

  • Yeonjae Ryu;Geun Hee Han;Eunsoo Jung;Daehee Hwang
    • Molecules and Cells
    • /
    • v.46 no.2
    • /
    • pp.106-119
    • /
    • 2023
  • With the increased number of single-cell RNA sequencing (scRNA-seq) datasets in public repositories, integrative analysis of multiple scRNA-seq datasets has become commonplace. Batch effects among different datasets are inevitable because of differences in cell isolation and handling protocols, library preparation technology, and sequencing platforms. To remove these batch effects for effective integration of multiple scRNA-seq datasets, a number of methodologies have been developed based on diverse concepts and approaches. These methods have proven useful for examining whether cellular features, such as cell subpopulations and marker genes, identified from a certain dataset, are consistently present, or whether their condition-dependent variations, such as increases in cell subpopulations in particular disease-related conditions, are consistently observed in different datasets generated under similar or distinct conditions. In this review, we summarize the concepts and approaches of the integration methods and their pros and cons as has been reported in previous literature.

Identification of Alternative Splicing and Fusion Transcripts in Non-Small Cell Lung Cancer by RNA Sequencing

  • Hong, Yoonki;Kim, Woo Jin;Bang, Chi Young;Lee, Jae Cheol;Oh, Yeon-Mok
    • Tuberculosis and Respiratory Diseases
    • /
    • v.79 no.2
    • /
    • pp.85-90
    • /
    • 2016
  • Background: Lung cancer is the most common cause of cancer related death. Alterations in gene sequence, structure, and expression have an important role in the pathogenesis of lung cancer. Fusion genes and alternative splicing of cancer-related genes have the potential to be oncogenic. In the current study, we performed RNA-sequencing (RNA-seq) to investigate potential fusion genes and alternative splicing in non-small cell lung cancer. Methods: RNA was isolated from lung tissues obtained from 86 subjects with lung cancer. The RNA samples from lung cancer and normal tissues were processed with RNA-seq using the HiSeq 2000 system. Fusion genes were evaluated using Defuse and ChimeraScan. Candidate fusion transcripts were validated by Sanger sequencing. Alternative splicing was analyzed using multivariate analysis of transcript sequencing and validated using quantitative real time polymerase chain reaction. Results: RNA-seq data identified oncogenic fusion genes EML4-ALK and SLC34A2-ROS1 in three of 86 normal-cancer paired samples. Nine distinct fusion transcripts were selected using DeFuse and ChimeraScan; of which, four fusion transcripts were validated by Sanger sequencing. In 33 squamous cell carcinoma, 29 tumor specific skipped exon events and six mutually exclusive exon events were identified. ITGB4 and PYCR1 were top genes that showed significant tumor specific splice variants. Conclusion: In conclusion, RNA-seq data identified novel potential fusion transcripts and splice variants. Further evaluation of their functional significance in the pathogenesis of lung cancer is required.

One-step spectral clustering of weighted variables on single-cell RNA-sequencing data (단세포 RNA 시퀀싱 데이터를 위한 가중변수 스펙트럼 군집화 기법)

  • Park, Min Young;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.511-526
    • /
    • 2020
  • Single-cell RNA-sequencing (scRNA-seq) data consists of each cell's RNA expression extracted from large populations of cells. One main purpose of using scRNA-seq data is to identify inter-cellular heterogeneity. However, scRNA-seq data pose statistical challenges when applying traditional clustering methods because they have many missing values and high level of noise due to technical and sampling issues. In this paper, motivated by analyzing scRNA-seq data, we propose a novel spectral-based clustering method by imposing different weights on genes when computing a similarity between cells. Assigning weights on genes and clustering cells are performed simultaneously in the proposed clustering framework. We solve the proposed non-convex optimization using an iterative algorithm. Both real data application and simulation study suggest that the proposed clustering method better identifies underlying clusters compared with existing clustering methods.

Dissecting Cellular Heterogeneity Using Single-Cell RNA Sequencing

  • Choi, Yoon Ha;Kim, Jong Kyoung
    • Molecules and Cells
    • /
    • v.42 no.3
    • /
    • pp.189-199
    • /
    • 2019
  • Cell-to-cell variability in gene expression exists even in a homogeneous population of cells. Dissecting such cellular heterogeneity within a biological system is a prerequisite for understanding how a biological system is developed, homeostatically regulated, and responds to external perturbations. Single-cell RNA sequencing (scRNA-seq) allows the quantitative and unbiased characterization of cellular heterogeneity by providing genome-wide molecular profiles from tens of thousands of individual cells. A major question in analyzing scRNA-seq data is how to account for the observed cell-to-cell variability. In this review, we provide an overview of scRNA-seq protocols, computational approaches for dissecting cellular heterogeneity, and future directions of single-cell transcriptomic analysis.

What Single Cell RNA Sequencing Has Taught Us about Chronic Obstructive Pulmonary Disease

  • Don D. Sin
    • Tuberculosis and Respiratory Diseases
    • /
    • v.87 no.3
    • /
    • pp.252-260
    • /
    • 2024
  • Chronic obstructive pulmonary disease (COPD) affects close to 400 million people worldwide and is the 3rd leading cause of mortality. It is a heterogeneous disorder with multiple endophenotypes, each driven by specific molecular networks and processes. Therapeutic discovery in COPD has lagged behind other disease areas owing to a lack of understanding of its pathobiology and scarcity of biomarkers to guide therapies. Single cell RNA sequencing (scRNA-seq) is a powerful new tool to identify important cellular and molecular networks that play a crucial role in disease pathogenesis. This paper provides an overview of the scRNA-seq technology and its application in COPD and the lessons learned to date from scRNA-seq experiments in COPD.