Search | Korea Science

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

Aydadenta, Husna;Adiwijaya, Adiwijaya
- Journal of Information Processing Systems
- /
- v.14 no.5
- /
- pp.1167-1175
- /
- 2018
Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.
https://doi.org/10.3745/JIPS.04.0087 인용 PDF KSCI

Normal Mixture Model with General Linear Regressive Restriction: Applied to Microarray Gene Clustering

Kim, Seung-Gu
- Communications for Statistical Applications and Methods
- /
- v.14 no.1
- /
- pp.205-213
- /
- 2007
In this paper, the normal mixture model subjected to general linear restriction for component-means based on linear regression is proposed, and its fitting method by EM algorithm and Lagrange multiplier is provided. This model is applied to gene clustering of microarray expression data, which demonstrates it has very good performances for real data set. This model also allows to obtain the clusters that an analyst wants to find out in the fashion that the hypothesis for component-means is represented by the design matrices and the linear restriction matrices.
https://doi.org/10.5351/CKSS.2007.14.1.205 인용 PDF KSCI

Learning Graphical Models for DNA Chip Data Mining

Zhang, Byoung-Tak
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2000.11a
- /
- pp.59-60
- /
- 2000
The past few years have seen a dramatic increase in gene expression data on the basis of DNA microarrays or DNA chips. Going beyond a generic view on the genome, microarray data are able to distinguish between gene populations in different tissues of the same organism and in different states of cells belonging to the same tissue. This affords a cell-wide view of the metabolic and regulatory processes under different conditions, building an effective basis for new diagnoses and therapies of diseases. In this talk we present machine learning techniques for effective mining of DNA microarray data. A brief introduction to the research field of machine learning from the computer science and artificial intelligence point of view is followed by a review of recently-developed learning algorithms applied to the analysis of DNA chip gene expression data. Emphasis is put on graphical models, such as Bayesian networks, latent variable models, and generative topographic mapping. Finally, we report on our own results of applying these learning methods to two important problems: the identification of cell cycle-regulated genes and the discovery of cancer classes by gene expression monitoring. The data sets are provided by the competition CAMDA-2000, the Critical Assessment of Techniques for Microarray Data Mining.
PDF

Significant Gene Selection Using Integrated Microarray Data Set with Batch Effect

Kim Ki-Yeol;Chung Hyun-Cheol;Jeung Hei-Cheul;Shin Ji-Hye;Kim Tae-Soo;Rha Sun-Young
- Genomics & Informatics
- /
- v.4 no.3
- /
- pp.110-117
- /
- 2006
In microarray technology, many diverse experimental features can cause biases including RNA sources, microarray production or different platforms, diverse sample processing and various experiment protocols. These systematic effects cause a substantial obstacle in the analysis of microarray data. When such data sets derived from different experimental processes were used, the analysis result was almost inconsistent and it is not reliable. Therefore, one of the most pressing challenges in the microarray field is how to combine data that comes from two different groups. As the novel trial to integrate two data sets with batch effect, we simply applied standardization to microarray data before the significant gene selection. In the gene selection step, we used new defined measure that considers the distance between a gene and an ideal gene as well as the between-slide and within-slide variations. Also we discussed the association of biological functions and different expression patterns in selected discriminative gene set. As a result, we could confirm that batch effect was minimized by standardization and the selected genes from the standardized data included various expression pattems and the significant biological functions.
PDF KSCI

TMA-OM(Tissue Microarray Object Model)과 주요 유전체 정보 통합

Kim Ju-Han
- Proceedings of the Korean Society for Bioinformatics Conference
- /
- 2006.02a
- /
- pp.30-36
- /
- 2006
Tissue microarray (TMA) is an array-based technology allowing the examination of hundreds of tissue samples on a single slide. To handle, exchange, and disseminate TMA data, we need standard representations of the methods used, of the data generated, and of the clinical and histopathological information related to TMA data analysis. This study aims to create a comprehensive data model with flexibility that supports diverse experimental designs and with expressivity and extensibility that enables an adequate and comprehensive description of new clinical and histopathological data elements. We designed a Tissue Microarray Object Model (TMA-OM). Both the Array Information and the Experimental Procedure models are created by referring to Microarray Gene Expression Object Model, Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments (MISFISHIE), and the TMA Data Exchange Specifications (TMA DES). The Clinical and Histopathological Information model is created by using CAP Cancer Protocols and National Cancer Institute Common Data Elements (NCI CDEs). MGED Ontology, UMLS and the terms extracted from CAP Cancer Protocols and NCI CDEs are used to create a controlled vocabulary for unambiguous annotation. We implemented a web-based application for TMA-OM, supporting data export in XML format conforming to the TMA DES or the DTD derived from TMA-OM. TMA-OM provides a comprehensive data model for storage, analysis and exchange of TMA data and facilitates model-level integration of other biological models.
PDF

Genes expression by using cDNA Microarray in Whallak-tang (활락탕(活絡湯)의 cDNA Microarray를 이용한 유전자 발현에 미치는 영향)

Sin, Cheol-Kyung;Lee, Chae-Woo;Yoo, Sun-Ae;Youn, Hyoun-Min;Jang, Kyung-Jeon;Song, Choon-Ho;Ahn, Chang-Beohm;Kim, Cheol-Hong
- Journal of Pharmacopuncture
- /
- v.11 no.4
- /
- pp.5-14
- /
- 2008
Objective This study was undertaken to determine the effect of Whallak-tang on expression of CD/cytokine Genes. Methods The expression of CD/Cytokine Genes were examined by cDNA microarray using the human mast cell line(HMC-1). Results The expression of ATP5F1, FLJ20671, unknown, KIAA0342, OAS2, unknown genes were increased in $200{\sim}300%$ range. The expression of unknown, MDS006, IFITM1, MRPL3, ZNF207, FTH1, FBP1, NRGN, NR1H2, KIAA0747 genes were decreased in $0{\sim}33%$ range. Conclusion These results would provide important basic data on the possibility of the clinical treatment of Whallak-tang in musculoskeletal disease.
https://doi.org/10.3831/KPI.2008.11.4.005 인용 PDF

An Iterative Normalization Algorithm for cDNA Microarray Medical Data Analysis

Kim, Yoonhee;Park, Woong-Yang;Kim, Ho
- Genomics & Informatics
- /
- v.2 no.2
- /
- pp.92-98
- /
- 2004
A cDNA microarray experiment is one of the most useful high-throughput experiments in medical informatics for monitoring gene expression levels. Statistical analysis with a cDNA microarray medical data requires a normalization procedure to reduce the systematic errors that are impossible to control by the experimental conditions. Despite the variety of normalization methods, this. paper suggests a more general and synthetic normalization algorithm with a control gene set based on previous studies of normalization. Iterative normalization method was used to select and include a new control gene set among the whole genes iteratively at every step of the normalization calculation initiated with the housekeeping genes. The objective of this iterative normalization was to maintain the pattern of the original data and to keep the gene expression levels stable. Spatial plots, M&A (ratio and average values of the intensity) plots and box plots showed a convergence to zero of the mean across all genes graphically after applying our iterative normalization. The practicability of the algorithm was demonstrated by applying our method to the data for the human photo aging study.
PDF KSCI

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
- The Korean Journal of Applied Statistics
- /
- v.24 no.6
- /
- pp.1077-1094
- /
- 2011
We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.
https://doi.org/10.5351/KJAS.2011.24.6.1077 인용 PDF KSCI

Analysis of Hemocyte-specific Gene Expression from Bombyx mori

Park, Seung-Won;Goo, Tae-Won;Kim, Seong-Ryul;Kang, Seok-Woo
- International Journal of Industrial Entomology and Biomaterials
- /
- v.23 no.1
- /
- pp.137-141
- /
- 2011
A previous data was provided information for tissuespecific expression genes by means of whole-genome oligonucleotide microarray in the silkworm. We analyzed the tissue-specific expression patterns in the hemocyte tissue on 5 days of 5th instar larvae during the development of $B.$ $mori$. Total 5 candidates pick out from the $Bombyx$ $mori$ Microarray Database (BmMDB; http://silkworm.swu.edu.cn/microarray). To verify the hemocyte-specific expression, we analyzed by semi-quantitative and real-time quantitative RT-PCR using the highly expressed endogenous $Actin$ RNA as an intrinsic reference. In this study, we confirmed that one gene-sw17255- out of 5 candidates expressed in the hemocyte tissue, which was consistent with the previous data. Circulating hemocytes in the body fluid of the $B.$ $mori$ are most powerful target organ for producing biomaterials. We need further studies to find hemocyte-specific promoter region from sw17255 gene. Finally, this result can be applied in creating transgenic silkworms as a biomedical insect.
https://doi.org/10.7852/ijie.2011.23.1.137 인용 PDF KSCI

A Study of a Biological Information Processing for DNA Microarray Expression Data (DNA Microarray 발현정보에 대한 생물학적 정보처리에 관한 연구)

Jo, Yeong-Im;Jeong, Hyeon-Cheol
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2007.11a
- /
- pp.149-152
- /
- 2007
본 논문은 바이오 인포메틱스의 분야를 간단히 소개하고 기능유전체학에서 microarray 실험에 대한 통계적 방법론을 살펴보고자 한다. 또한 DNA chip 설계와 생물학적 특정에 대해 살펴보고 각 분야에서 적용되는 통계적 방법을 연구분석 해보고자 한다.
PDF

Search Result 360, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)