• Title/Summary/Keyword: RNA sequencing Big-Data

Search Result 9, Processing Time 0.025 seconds

A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data

  • Kim, Chayoung
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.12
    • /
    • pp.25-30
    • /
    • 2018
  • It is extremely lacking and urgently required that the method of constructing the Gene Regulatory Network (GRN) from RNA-Sequencing data (RNA-Seq) because of Big-Data and GRN in Big-Data has obtained substantial observation as the interactions among relevant featured genes and their regulations. We propose newly the computational comparative feature patterns selection method by implementing a minimum-redundancy maximum-relevancy (MRMR) filter the support vector machine-recursive feature elimination (SVM-RFE) with Intensity-dependent normalization (DEGSEQ) as a preprocessor for emphasizing equal preciseness in RNA-seq in Big-Data. We found out the proposed algorithm might be more scalable and convenient because of all libraries in R package and be more improved in terms of the time consuming in Big-Data and minimum-redundancy maximum-relevancy of a set of feature patterns at the same time.

Combining Support Vector Machine Recursive Feature Elimination and Intensity-dependent Normalization for Gene Selection in RNAseq (RNAseq 빅데이터에서 유전자 선택을 위한 밀집도-의존 정규화 기반의 서포트-벡터 머신 병합법)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.18 no.5
    • /
    • pp.47-53
    • /
    • 2017
  • In past few years, high-throughput sequencing, big-data generation, cloud computing, and computational biology are revolutionary. RNA sequencing is emerging as an attractive alternative to DNA microarrays. And the methods for constructing Gene Regulatory Network (GRN) from RNA-Seq are extremely lacking and urgently required. Because GRN has obtained substantial observation from genomics and bioinformatics, an elementary requirement of the GRN has been to maximize distinguishable genes. Despite of RNA sequencing techniques to generate a big amount of data, there are few computational methods to exploit the huge amount of the big data. Therefore, we have suggested a novel gene selection algorithm combining Support Vector Machines and Intensity-dependent normalization, which uses log differential expression ratio in RNAseq. It is an extended variation of support vector machine recursive feature elimination (SVM-RFE) algorithm. This algorithm accomplishes minimum relevancy with subsets of Big-Data, such as NCBI-GEO. The proposed algorithm was compared to the existing one which uses gene expression profiling DNA microarrays. It finds that the proposed algorithm have provided as convenient and quick method than previous because it uses all functions in R package and have more improvement with regard to the classification accuracy based on gene ontology and time consuming in terms of Big-Data. The comparison was performed based on the number of genes selected in RNAseq Big-Data.

A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning (앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

Big Data Analytics in RNA-sequencing (RNA 시퀀싱 기법으로 생성된 빅데이터 분석)

  • Sung-Hun WOO;Byung Chul JUNG
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.55 no.4
    • /
    • pp.235-243
    • /
    • 2023
  • As next-generation sequencing has been developed and used widely, RNA-sequencing (RNA-seq) has rapidly emerged as the first choice of tools to validate global transcriptome profiling. With the significant advances in RNA-seq, various types of RNA-seq have evolved in conjunction with the progress in bioinformatic tools. On the other hand, it is difficult to interpret the complex data underlying the biological meaning without a general understanding of the types of RNA-seq and bioinformatic approaches. In this regard, this paper discusses the two main sections of RNA-seq. First, two major variants of RNA-seq are described and compared with the standard RNA-seq. This provides insights into which RNA-seq method is most appropriate for their research. Second, the most widely used RNA-seq data analyses are discussed: (1) exploratory data analysis and (2) pathway enrichment analysis. This paper introduces the most widely used exploratory data analysis for RNA-seq, such as principal component analysis, heatmap, and volcano plot, which can provide the overall trends in the dataset. The pathway enrichment analysis section introduces three generations of pathway enrichment analysis and how they generate enriched pathways with the RNA-seq dataset.

The Workflow for Computational Analysis of Single-cell RNA-sequencing Data (단일 세포 RNA 시퀀싱 데이터에 대한 컴퓨터 분석의 작업과정)

  • Sung-Hun WOO;Byung Chul JUNG
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.56 no.1
    • /
    • pp.10-20
    • /
    • 2024
  • RNA-sequencing (RNA-seq) is a technique used for providing global patterns of transcriptomes in samples. However, it can only provide the average gene expression across cells and does not address the heterogeneity within the samples. The advances in single-cell RNA sequencing (scRNA-seq) technology have revolutionized our understanding of heterogeneity and the dynamics of gene expression at the single-cell level. For example, scRNA-seq allows us to identify the cell types in complex tissues, which can provide information regarding the alteration of the cell population by perturbations, such as genetic modification. Since its initial introduction, scRNA-seq has rapidly become popular, leading to the development of a huge number of bioinformatic tools. However, the analysis of the big dataset generated from scRNA-seq requires a general understanding of the preprocessing of the dataset and a variety of analytical techniques. Here, we present an overview of the workflow involved in analyzing the scRNA-seq dataset. First, we describe the preprocessing of the dataset, including quality control, normalization, and dimensionality reduction. Then, we introduce the downstream analysis provided with the most commonly used computational packages. This review aims to provide a workflow guideline for new researchers interested in this field.

Variable Selection of Feature Pattern using SVM-based Criterion with Q-Learning in Reinforcement Learning (SVM-기반 제약 조건과 강화학습의 Q-learning을 이용한 변별력이 확실한 특징 패턴 선택)

  • Kim, Chayoung
    • Journal of Internet Computing and Services
    • /
    • v.20 no.4
    • /
    • pp.21-27
    • /
    • 2019
  • Selection of feature pattern gathered from the observation of the RNA sequencing data (RNA-seq) are not all equally informative for identification of differential expressions: some of them may be noisy, correlated or irrelevant because of redundancy in Big-Data sets. Variable selection of feature pattern aims at differential expressed gene set that is significantly relevant for a special task. This issues are complex and important in many domains, for example. In terms of a computational research field of machine learning, selection of feature pattern has been studied such as Random Forest, K-Nearest and Support Vector Machine (SVM). One of most the well-known machine learning algorithms is SVM, which is classical as well as original. The one of a member of SVM-criterion is Support Vector Machine-Recursive Feature Elimination (SVM-RFE), which have been utilized in our research work. We propose a novel algorithm of the SVM-RFE with Q-learning in reinforcement learning for better variable selection of feature pattern. By comparing our proposed algorithm with the well-known SVM-RFE combining Welch' T in published data, our result can show that the criterion from weight vector of SVM-RFE enhanced by Q-learning has been improved by an off-policy by a more exploratory scheme of Q-learning.

Non-invasive evaluation of embryo quality for the selection of transferable embryos in human in vitro fertilization-embryo transfer

  • Jihyun Kim;Jaewang Lee;Jin Hyun Jun
    • Clinical and Experimental Reproductive Medicine
    • /
    • v.49 no.4
    • /
    • pp.225-238
    • /
    • 2022
  • The ultimate goal of human assisted reproductive technology is to achieve a healthy pregnancy and birth, ideally from the selection and transfer of a single competent embryo. Recently, techniques for efficiently evaluating the state and quality of preimplantation embryos using time-lapse imaging systems have been applied. Artificial intelligence programs based on deep learning technology and big data analysis of time-lapse monitoring system during in vitro culture of preimplantation embryos have also been rapidly developed. In addition, several molecular markers of the secretome have been successfully analyzed in spent embryo culture media, which could easily be obtained during in vitro embryo culture. It is also possible to analyze small amounts of cell-free nucleic acids, mitochondrial nucleic acids, miRNA, and long non-coding RNA derived from embryos using real-time polymerase chain reaction (PCR) or digital PCR, as well as next-generation sequencing. Various efforts are being made to use non-invasive evaluation of embryo quality (NiEEQ) to select the embryo with the best developmental competence. However, each NiEEQ method has some limitations that should be evaluated case by case. Therefore, an integrated analysis strategy fusing several NiEEQ methods should be urgently developed and confirmed by proper clinical trials.

Qualitative and Quantitative Analysis for Microbiome Data Matching between Objects (마이크로바이옴 데이터 일치를 위한 물체들 사이의 정량 및 정성적 분석)

  • You, Hee Sang;Ok, Yeon Jeong;Lee, Song Hee;Lee, So Lip;Lee, Young Ju;Lee, Min Ho;Hyun, Sung Hee
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.52 no.3
    • /
    • pp.202-213
    • /
    • 2020
  • Although technological advances have allowed the efficient collection of large amounts of microbiome data for microbiological studies, proper analysis tools for such big data are still lacking. Additionally, analyses of microbial communities using poor databases can lead to misleading results. Hence, this study aimed to design an appropriate method for the analysis of big microbial databases. Bacteria were collected from the fingertips and personal belongings (mobile phones and laptop keyboards) of individuals. The genomic DNA was extracted from these bacteria and subjected to next-generation sequencing by targeting the 16S rRNA gene. The accuracy of the bacterial matching percentage between the fingertips and personal belongings was verified using a formula and an environment-related and human-related database. To design appropriate analysis, the bacterial matching accuracy was calculated based on the following three categories: comparison between qualitative and quantitative analysis, comparisons within same-gender participants as well as all participants regardless of gender, and comparison between the use of a human-related bacterial database (hDB) and environment-related bacterial database (eDB). The results showed that qualitative analysis, comparisons within same-gender participants, and the use of hDB provided relatively accurate results. This study provides an analytical method to obtain accurate results when conducting studies involving big microbiological data using human-derived microorganisms.

Prebiotics enhance the biotransformation and bioavailability of ginsenosides in rats by modulating gut microbiota

  • Zhang, Xiaoyan;Chen, Sha;Duan, Feipeng;Liu, An;Li, Shaojing;Zhong, Wen;Sheng, Wei;Chen, Jun;Xu, Jiang;Xiao, Shuiming
    • Journal of Ginseng Research
    • /
    • v.45 no.2
    • /
    • pp.334-343
    • /
    • 2021
  • Background: Gut microbiota mainly function in the biotransformation of primary ginsenosides into bioactive metabolites. Herein, we investigated the effects of three prebiotic fibers by targeting gut microbiota on the metabolism of ginsenoside Rb1 in vivo. Methods: Sprague Dawley rats were administered with ginsenoside Rb1 after a two-week prebiotic intervention of fructooligosaccharide, galactooligosaccharide, and fibersol-2, respectively. Pharmacokinetic analysis of ginsenoside Rb1 and its metabolites was performed, whilst the microbial composition and metabolic function of gut microbiota were examined by 16S rRNA gene amplicon and metagenomic shotgun sequencing. Results: The results showed that peak plasma concentration and area under concentration time curve of ginsenoside Rb1 and its intermediate metabolites, ginsenoside Rd, F2, and compound K (CK), in the prebiotic intervention groups were increased at various degrees compared with those in the control group. Gut microbiota dramatically responded to the prebiotic treatment at both taxonomical and functional levels. The abundance of Prevotella, which possesses potential function to hydrolyze ginsenoside Rb1 into CK, was significantly elevated in the three prebiotic groups (P < 0.05). The gut metagenomic analysis also revealed the functional gene enrichment for terpenoid/polyketide metabolism, glycolysis, gluconeogenesis, propanoate metabolism, etc. Conclusion: These findings imply that prebiotics may selectively promote the proliferation of certain bacterial stains with glycoside hydrolysis capacity, thereby, subsequently improving the biotransformation and bioavailability of primary ginsenosides in vivo.