• 제목/요약/키워드: MA-plot-based methods

검색결과 4건 처리시간 0.019초

A MA-plot-based Feature Selection by MRMR in SVM-RFE in RNA-Sequencing Data

  • Kim, Chayoung
    • 한국정보기술학회논문지
    • /
    • 제16권12호
    • /
    • pp.25-30
    • /
    • 2018
  • 유전자 규정 네트워크 (GRN)에 RNA-시퀀싱 데이터를 활용할 때, 해당 유전자와 환경과의 상호 작용에 의해서 생기는 형질들 중에서 연관성이 높은 유전자로 GRN을 구성하는 것은 상당히 어려운 일이다. 본 연구에서는 Big-Data의 RNA-시퀀싱 자료들로, 지지 벡터 머신 회귀 특징 추출(SVM-RFE) 에 근거하여, 연관성이 높은 유전자(maximum-relevancy)는 추출하고, 연관성이 낮은 유전자(minimum-redundancy)는 제거하는 MRMR 필터 방법을 집중도 의존 정규화(intensity-dependent normalization, DEGSEQ)에 기반 하여 데이터의 정밀성을 높여, 소수 연관성 높은 유전자만 판별해 내는 방법을 사용한다. 제안한 방법은 R 언어 패키지를 사용하여 편리함과 동시에, 다른 기존의 방법을 비교하였을 때, Big-Data의 시간 활용도를 높이면서, 동시에 높은 연관성 있는 유전자만을 잘 추출해 냄을 확인하였다.

In silico characterisation, homology modelling and structure-based functional annotation of blunt snout bream (Megalobrama amblycephala) Hsp70 and Hsc70 proteins

  • Tran, Ngoc Tuan;Jakovlic, Ivan;Wang, Wei-Min
    • Journal of Animal Science and Technology
    • /
    • 제57권12호
    • /
    • pp.44.1-44.9
    • /
    • 2015
  • Background: Heat shock proteins play an important role in protection from stress stimuli and metabolic insults in almost all organisms. Methods: In this study, computational tools were used to deeply analyse the physicochemical characteristics and, using homology modelling, reliably predict the tertiary structure of the blunt snout bream (Ma-) Hsp70 and Hsc70 proteins. Derived three-dimensional models were then used to predict the function of the proteins. Results: Previously published predictions regarding the protein length, molecular weight, theoretical isoelectric point and total number of positive and negative residues were corroborated. Among the new findings are: the extinction coefficient (33725/33350 and 35090/34840 - Ma-Hsp70/ Ma-Hsc70, respectively), instability index (33.68/35.56 - both stable), aliphatic index (83.44/80.23 - both very stable), half-life estimates (both relatively stable), grand average of hydropathicity (-0.431/-0.473 - both hydrophilic) and amino acid composition (alanine-lysine-glycine/glycine-lysine-aspartic acid were the most abundant, no disulphide bonds, the N-terminal of both proteins was methionine). Homology modelling was performed by SWISS-MODEL program and the proposed model was evaluated as highly reliable based on PROCHECK's Ramachandran plot, ERRAT, PROVE, Verify 3D, ProQ and ProSA analyses. Conclusions: The research revealed a high structural similarity to Hsp70 and Hsc70 proteins from several taxonomically distant animal species, corroborating a remarkably high level of evolutionary conservation among the members of this protein family. Functional annotation based on structural similarity provides a reliable additional indirect evidence for a high level of functional conservation of these two genes/proteins in blunt snout bream, but it is not sensitive enough to functionally distinguish the two isoforms.

Comparison of Normalization Methods for Defining Copy Number Variation Using Whole-genome SNP Genotyping Data

  • Kim, Ji-Hong;Yim, Seon-Hee;Jeong, Yong-Bok;Jung, Seong-Hyun;Xu, Hai-Dong;Shin, Seung-Hun;Chung, Yeun-Jun
    • Genomics & Informatics
    • /
    • 제6권4호
    • /
    • pp.231-234
    • /
    • 2008
  • Precise and reliable identification of CNV is still important to fully understand the effect of CNV on genetic diversity and background of complex diseases. SNP marker has been used frequently to detect CNVs, but the analysis of SNP chip data for identifying CNV has not been well established. We compared various normalization methods for CNV analysis and suggest optimal normalization procedure for reliable CNV call. Four normal Koreans and NA10851 HapMap male samples were genotyped using Affymetrix Genome-Wide Human SNP array 5.0. We evaluated the effect of median and quantile normalization to find the optimal normalization for CNV detection based on SNP array data. We also explored the effect of Robust Multichip Average (RMA) background correction for each normalization process. In total, the following 4 combinations of normalization were tried: 1) Median normalization without RMA background correction, 2) Quantile normalization without RMA background correction, 3) Median normalization with RMA background correction, and 4) Quantile normalization with RMA background correction. CNV was called using SW-ARRAY algorithm. We applied 4 different combinations of normalization and compared the effect using intensity ratio profile, box plot, and MA plot. When we applied median and quantile normalizations without RMA background correction, both methods showed similar normalization effect and the final CNV calls were also similar in terms of number and size. In both median and quantile normalizations, RMA backgroundcorrection resulted in widening the range of intensity ratio distribution, which may suggest that RMA background correction may help to detect more CNVs compared to no correction.

Determination of Survival of Gastric Cancer Patients With Distant Lymph Node Metastasis Using Prealbumin Level and Prothrombin Time: Contour Plots Based on Random Survival Forest Algorithm on High-Dimensionality Clinical and Laboratory Datasets

  • Zhang, Cheng;Xie, Minmin;Zhang, Yi;Zhang, Xiaopeng;Feng, Chong;Wu, Zhijun;Feng, Ying;Yang, Yahui;Xu, Hui;Ma, Tai
    • Journal of Gastric Cancer
    • /
    • 제22권2호
    • /
    • pp.120-134
    • /
    • 2022
  • Purpose: This study aimed to identify prognostic factors for patients with distant lymph node-involved gastric cancer (GC) using a machine learning algorithm, a method that offers considerable advantages and new prospects for high-dimensional biomedical data exploration. Materials and Methods: This study employed 79 features of clinical pathology, laboratory tests, and therapeutic details from 289 GC patients whose distant lymphadenopathy was presented as the first episode of recurrence or metastasis. Outcomes were measured as any-cause death events and survival months after distant lymph node metastasis. A prediction model was built based on possible outcome predictors using a random survival forest algorithm and confirmed by 5×5 nested cross-validation. The effects of single variables were interpreted using partial dependence plots. A contour plot was used to visually represent survival prediction based on 2 predictive features. Results: The median survival time of patients with GC with distant nodal metastasis was 9.2 months. The optimal model incorporated the prealbumin level and the prothrombin time (PT), and yielded a prediction error of 0.353. The inclusion of other variables resulted in poorer model performance. Patients with higher serum prealbumin levels or shorter PTs had a significantly better prognosis. The predicted one-year survival rate was stratified and illustrated as a contour plot based on the combined effect the prealbumin level and the PT. Conclusions: Machine learning is useful for identifying the important determinants of cancer survival using high-dimensional datasets. The prealbumin level and the PT on distant lymph node metastasis are the 2 most crucial factors in predicting the subsequent survival time of advanced GC.