• 제목/요약/키워드: RNA-seq datasets

검색결과 12건 처리시간 0.027초

Integration of Single-Cell RNA-Seq Datasets: A Review of Computational Methods

  • Yeonjae Ryu;Geun Hee Han;Eunsoo Jung;Daehee Hwang
    • Molecules and Cells
    • /
    • 제46권2호
    • /
    • pp.106-119
    • /
    • 2023
  • With the increased number of single-cell RNA sequencing (scRNA-seq) datasets in public repositories, integrative analysis of multiple scRNA-seq datasets has become commonplace. Batch effects among different datasets are inevitable because of differences in cell isolation and handling protocols, library preparation technology, and sequencing platforms. To remove these batch effects for effective integration of multiple scRNA-seq datasets, a number of methodologies have been developed based on diverse concepts and approaches. These methods have proven useful for examining whether cellular features, such as cell subpopulations and marker genes, identified from a certain dataset, are consistently present, or whether their condition-dependent variations, such as increases in cell subpopulations in particular disease-related conditions, are consistently observed in different datasets generated under similar or distinct conditions. In this review, we summarize the concepts and approaches of the integration methods and their pros and cons as has been reported in previous literature.

Identification of Prostate Cancer LncRNAs by RNA-Seq

  • Hu, Cheng-Cheng;Gan, Ping;Zhang, Rui-Ying;Xue, Jin-Xia;Ran, Long-Ke
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권21호
    • /
    • pp.9439-9444
    • /
    • 2014
  • Purpose: To identify prostate cancer lncRNAs using a pipeline proposed in this study, which is applicable for the identification of lncRNAs that are differentially expressed in prostate cancer tissues but have a negligible potential to encode proteins. Materials and Methods: We used two publicly available RNA-Seq datasets from normal prostate tissue and prostate cancer. Putative lncRNAs were predicted using the biological technology, then specific lncRNAs of prostate cancer were found by differential expression analysis and co-expression network was constructed by the weighted gene co-expression network analysis. Results: A total of 1,080 lncRNA transcripts were obtained in the RNA-Seq datasets. Three genes (PCA3, C20orf166-AS1 and RP11-267A15.1) showed a significant differential expression in the prostate cancer tissues, and were thus identified as prostate cancer specific lncRNAs. Brown and black modules had significant negative and positive correlations with prostate cancer, respectively. Conclusions: The pipeline proposed in this study is useful for the prediction of prostate cancer specific lncRNAs. Three genes (PCA3, C20orf166-AS1, and RP11-267A15.1) were identified to have a significant differential expression in prostate cancer tissues. However, there have been no published studies to demonstrate the specificity of RP11-267A15.1 in prostate cancer tissues. Thus, the results of this study can provide a new theoretic insight into the identification of prostate cancer specific genes.

Single-cell and spatial transcriptomics approaches of cardiovascular development and disease

  • Roth, Robert;Kim, Soochi;Kim, Jeesu;Rhee, Siyeon
    • BMB Reports
    • /
    • 제53권8호
    • /
    • pp.393-399
    • /
    • 2020
  • Recent advancements in the resolution and throughput of single-cell analyses, including single-cell RNA sequencing (scRNA-seq), have achieved significant progress in biomedical research in the last decade. These techniques have been used to understand cellular heterogeneity by identifying many rare and novel cell types and characterizing subpopulations of cells that make up organs and tissues. Analysis across various datasets can elucidate temporal patterning in gene expression and developmental cues and is also employed to examine the response of cells to acute injury, damage, or disruption. Specifically, scRNA-seq and spatially resolved transcriptomics have been used to describe the identity of novel or rare cell subpopulations and transcriptional variations that are related to normal and pathological conditions in mammalian models and human tissues. These applications have critically contributed to advance basic cardiovascular research in the past decade by identifying novel cell types implicated in development and disease. In this review, we describe current scRNA-seq technologies and how current scRNA-seq and spatial transcriptomic (ST) techniques have advanced our understanding of cardiovascular development and disease.

Integration and Reanalysis of Four RNA-Seq Datasets Including BALF, Nasopharyngeal Swabs, Lung Biopsy, and Mouse Models Reveals Common Immune Features of COVID-19

  • Rudi Alberts;Sze Chun Chan;Qian-Fang Meng;Shan He;Lang Rao;Xindong Liu;Yongliang Zhang
    • IMMUNE NETWORK
    • /
    • 제22권3호
    • /
    • pp.22.1-22.25
    • /
    • 2022
  • Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndromecoronavirus-2 (SARS-CoV-2), has spread over the world causing a pandemic which is still ongoing since its emergence in late 2019. A great amount of effort has been devoted to understanding the pathogenesis of COVID-19 with the hope of developing better therapeutic strategies. Transcriptome analysis using technologies such as RNA sequencing became a commonly used approach in study of host immune responses to SARS-CoV-2. Although substantial amount of information can be gathered from transcriptome analysis, different analysis tools used in these studies may lead to conclusions that differ dramatically from each other. Here, we re-analyzed four RNA-sequencing datasets of COVID-19 samples including human bronchoalveolar lavage fluid, nasopharyngeal swabs, lung biopsy and hACE2 transgenic mice using the same standardized method. The results showed that common features of COVID-19 include upregulation of chemokines including CCL2, CXCL1, and CXCL10, inflammatory cytokine IL-1β and alarmin S100A8/S100A9, which are associated with dysregulated innate immunity marked by abundant neutrophil and mast cell accumulation. Downregulation of chemokine receptor genes that are associated with impaired adaptive immunity such as lymphopenia is another common feather of COVID-19 observed. In addition, a few interferon-stimulated genes but no type I IFN genes were identified to be enriched in COVID-19 samples compared to their respective control in these datasets. These features are in line with results from single-cell RNA sequencing studies in the field. Therefore, our re-analysis of the RNA-seq datasets revealed common features of dysregulated immune responses to SARS-CoV-2 and shed light to the pathogenesis of COVID-19.

앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구 (A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제2권1호
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

FusionScan: accurate prediction of fusion genes from RNA-Seq data

  • Kim, Pora;Jang, Ye Eun;Lee, Sanghyuk
    • Genomics & Informatics
    • /
    • 제17권3호
    • /
    • pp.26.1-26.12
    • /
    • 2019
  • Identification of fusion gene is of prominent importance in cancer research field because of their potential as carcinogenic drivers. RNA sequencing (RNA-Seq) data have been the most useful source for identification of fusion transcripts. Although a number of algorithms have been developed thus far, most programs produce too many false-positives, thus making experimental confirmation almost impossible. We still lack a reliable program that achieves high precision with reasonable recall rate. Here, we present FusionScan, a highly optimized tool for predicting fusion transcripts from RNA-Seq data. We specifically search for split reads composed of intact exons at the fusion boundaries. Using 269 known fusion cases as the reference, we have implemented various mapping and filtering strategies to remove false-positives without discarding genuine fusions. In the performance test using three cell line datasets with validated fusion cases (NCI-H660, K562, and MCF-7), FusionScan outperformed other existing programs by a considerable margin, achieving the precision and recall rates of 60% and 79%, respectively. Simulation test also demonstrated that FusionScan recovered most of true positives without producing an overwhelming number of false-positives regardless of sequencing depth and read length. The computation time was comparable to other leading tools. We also provide several curative means to help users investigate the details of fusion candidates easily. We believe that FusionScan would be a reliable, efficient and convenient program for detecting fusion transcripts that meet the requirements in the clinical and experimental community. FusionScan is freely available at http://fusionscan.ewha.ac.kr/.

A demonstration of the H3 trimethylation ChIP-seq analysis of galline follicular mesenchymal cells and male germ cells

  • Chokeshaiusaha, Kaj;Puthier, Denis;Nguyen, Catherine;Sananmuang, Thanida
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제31권6호
    • /
    • pp.791-797
    • /
    • 2018
  • Objective: Trimethylation of histone 3 (H3) at 4th lysine N-termini (H3K4me3) in gene promoter region was the universal marker of active genes specific to cell lineage. On the contrary, coexistence of trimethylation at 27th lysine (H3K27me3) in the same loci-the bivalent H3K4m3/H3K27me3 was known to suspend the gene transcription in germ cells, and could also be inherited to the developed stem cell. In galline species, throughout example of H3K4m3 and H3K27me3 ChIP-seq analysis was still not provided. We therefore designed and demonstrated such procedures using ChIP-seq and mRNA-seq data of chicken follicular mesenchymal cells and male germ cells. Methods: Analytical workflow was designed and provided in this study. ChIP-seq and RNA-seq datasets of follicular mesenchymal cells and male germ cells were acquired and properly preprocessed. Peak calling by Model-based analysis of ChIP-seq 2 was performed to identify H3K4m3 or H3K27me3 enriched regions ($Fold-change{\geq}2$, $FDR{\leq}0.01$) in gene promoter regions. Integrative genomics viewer was utilized for cellular retinoic acid binding protein 1 (CRABP1), growth differentiation factor 10 (GDF10), and gremlin 1 (GREM1) gene explorations. Results: The acquired results indicated that follicular mesenchymal cells and germ cells shared several unique gene promoter regions enriched with H3K4me3 (5,704 peaks) and also unique regions of bivalent H3K4m3/H3K27me3 shared between all cell types and germ cells (1,909 peaks). Subsequent observation of follicular mesenchyme-specific genes-CRABP1, GDF10, and GREM1 correctly revealed vigorous transcriptions of these genes in follicular mesenchymal cells. As expected, bivalent H3K4m3/H3K27me3 pattern was manifested in gene promoter regions of germ cells, and thus suspended their transcriptions. Conclusion: According the results, an example of chicken H3K4m3/H3K27me3 ChIP-seq data analysis was successfully demonstrated in this study. Hopefully, the provided methodology should hereby be useful for galline ChIP-seq data analysis in the future.

Analyses of alternative polyadenylation: from old school biochemistry to high-throughput technologies

  • Yeh, Hsin-Sung;Zhang, Wei;Yong, Jeongsik
    • BMB Reports
    • /
    • 제50권4호
    • /
    • pp.201-207
    • /
    • 2017
  • Alternations in usage of polyadenylation sites during transcription termination yield transcript isoforms from a gene. Recent findings of transcriptome-wide alternative polyadenylation (APA) as a molecular response to changes in biology position APA not only as a molecular event of early transcriptional termination but also as a cellular regulatory step affecting various biological pathways. With the development of high-throughput profiling technologies at a single nucleotide level and their applications targeted to the 3'-end of mRNAs, dynamics in the landscape of mRNA 3'-end is measureable at a global scale. In this review, methods and technologies that have been adopted to study APA events are discussed. In addition, various bioinformatics algorithms for APA isoform analysis using publicly available RNA-seq datasets are introduced.

CDRgator: An Integrative Navigator of Cancer Drug Resistance Gene Signatures

  • Jang, Su-Kyeong;Yoon, Byung-Ha;Kang, Seung Min;Yoon, Yeo-Gha;Kim, Seon-Young;Kim, Wankyu
    • Molecules and Cells
    • /
    • 제42권3호
    • /
    • pp.237-244
    • /
    • 2019
  • Understanding the mechanisms of cancer drug resistance is a critical challenge in cancer therapy. For many cancer drugs, various resistance mechanisms have been identified such as target alteration, alternative signaling pathways, epithelial-mesenchymal transition, and epigenetic modulation. Resistance may arise via multiple mechanisms even for a single drug, making it necessary to investigate multiple independent models for comprehensive understanding and therapeutic application. In particular, we hypothesize that different resistance processes result in distinct gene expression changes. Here, we present a web-based database, CDRgator (Cancer Drug Resistance navigator) for comparative analysis of gene expression signatures of cancer drug resistance. Resistance signatures were extracted from two different types of datasets. First, resistance signatures were extracted from transcriptomic profiles of cancer cells or patient samples and their resistance-induced counterparts for >30 cancer drugs. Second, drug resistance group signatures were also extracted from two large-scale drug sensitivity datasets representing ~1,000 cancer cell lines. All the datasets are available for download, and are conveniently accessible based on drug class and cancer type, along with analytic features such as clustering analysis, multidimensional scaling, and pathway analysis. CDRgator allows meta-analysis of independent resistance models for more comprehensive understanding of drug-resistance mechanisms that is difficult to accomplish with individual datasets alone (database URL: http://cdrgator.ewha.ac.kr).

Investigation of Splicing Quantitative Trait Loci in Arabidopsis thaliana

  • Yoo, Wonseok;Kyung, Sungkyu;Han, Seonggyun;Kim, Sangsoo
    • Genomics & Informatics
    • /
    • 제14권4호
    • /
    • pp.211-215
    • /
    • 2016
  • The alteration of alternative splicing patterns has an effect on the quantification of functional proteins, leading to phenotype variation. The splicing quantitative trait locus (sQTL) is one of the main genetic elements affecting splicing patterns. Here, we report the results of genome-wide sQTLs across 141 strains of Arabidopsis thaliana with publicly available next generation sequencing datasets. As a result, we found 1,694 candidate sQTLs in Arabidopsis thaliana at a false discovery rate of 0.01. Furthermore, among the candidate sQTLs, we found 25 sQTLs that overlapped with the list of previously examined trait-associated single nucleotide polymorphisms (SNPs). In summary, this sQTL analysis provides new insight into genetic elements affecting alternative splicing patterns in Arabidopsis thaliana and the mechanism of previously reported trait-associated SNPs.