• Title/Summary/Keyword: Clustering genes

Search Result 141, Processing Time 0.023 seconds

Hierarchical Clustering of Gene Expression Data Based on Self Organizing Map (자기 조직화 지도에 기반한 유전자 발현 데이터의 계층적 군집화)

  • Park, Chang-Beom;Lee, Dong-Hwan;Lee, Seong-Whan
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.170-177
    • /
    • 2003
  • Gene expression data are the quantitative measurements of expression levels and ratios of numberous genes in different situations based on microarray image analysis results. The process to draw meaningful information related to genomic diseases and various biological activities from gene expression data is known as gene expression data analysis. In this paper, we present a hierarchical clustering method of gene expression data based on self organizing map which can analyze the clustering result of gene expression data more efficiently. Using our proposed method, we could eliminate the uncertainty of cluster boundary which is the inherited disadvantage of self organizing map and use the visualization function of hierarchical clustering. And, we could process massive data using fast processing speed of self organizing map and interpret the clustering result of self organizing map more efficiently and user-friendly. To verify the efficiency of our proposed algorithm, we performed tests with following 3 data sets, animal feature data set, yeast gene expression data and leukemia gene expression data set. The result demonstrated the feasibility and utility of the proposed clustering algorithm.

  • PDF

The Sliding Window Gene-Shaving Algorithm for Microarray Data Analysis

  • 이혜선;최대우;전치혁
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2002.06a
    • /
    • pp.139-152
    • /
    • 2002
  • Gene-shaving(Hastie et al, 2000) is a very useful method to identify a meaningful group of genes when the variation of expression is large. By shaving off the low-correlated genes with the leading principal component, the primary genes with the coherent expression pattern can be identified. Gene-shaving method works well If expression levels are varied enough, but it may not catch the meaningful cluster in low expression level or different expression time even with coherent patterns. The sliding window gene-shaving method which is to apply gene-shaving in each sliding window after hierarchical clustering is to compensate losing a meaningful set of genes whose variation is not large but distinct. The performance to identify expression patterns is compared for the simulated profile data by the different variance and expression level.

  • PDF

Toxicogenomic Study to Identify Potential New Mechanistic Markers on Direct-Acting Mutagens in Human Hepatocytes (THLE-3)

  • Kim, Youn-Jung;Song, Mi-Kyung;Song, Mee;Ryu, Jae-Chun
    • Molecular & Cellular Toxicology
    • /
    • v.3 no.4
    • /
    • pp.231-237
    • /
    • 2007
  • Exposure to DNA-damaging agents can elicit a variety of stress-related responses that may alter the expression of genes associated with numerous biological pathways. We used 19 k whole human genome chip to detect gene expression profiles and potential signature genes in human normal hepatocytes (THLE-3) by treatment of five direct acting mutagens, furylfuramide (AF-2), N-nitroso-N-methylurea (MNU), methylmethanesulfonate (MMS), 4-nitroquinoline-N-oxide (4-NQO) and 2-nitrofluorene (2NF) of the $IC_{20}$ concentration for 3 h. Fifty one up-regulated common genes and 45 down-regulated common genes above 1.5-fold by five direct-acting mutagens were identified by clustering analysis. Many of these changed genes have some association with apoptosis, control of cell cycle, regulation of transcription and signal transduction. Genes related to these functions, as TP73L, E2F5, MST016, SOX5, MAFB, LIF, SII3, TFIIS, EMR1, CYTL1, CX3CR1 and RHOH are up-regulated. Down-regulated genes are ALOX15B, xs155, IFITM1, BATF, VAV2, CD79A, DCDC2, TNFSF8 and KOX8. We suggest that gene expression profiling on mutagens by toxicogenomic analysis affords promising opportunities to reveal potential new mechanistic markers of genotoxicity.

Gene Screening and Clustering of Yeast Microarray Gene Expression Data (효모 마이크로어레이 유전자 발현 데이터에 대한 유전자 선별 및 군집분석)

  • Lee, Kyung-A;Kim, Tae-Houn;Kim, Jae-Hee
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.6
    • /
    • pp.1077-1094
    • /
    • 2011
  • We accomplish clustering analyses for yeast cell cycle microarray expression data. To reflect the characteristics of a time-course data, we screen the genes using the test statistics with Fourier coefficients applying a FDR procedure. We compare the results done by model-based clustering, K-means, PAM, SOM, hierarchical Ward method and Fuzzy method with the yeast data. As the validity measure for clustering results, connectivity, Dunn index and silhouette values are computed and compared. A biological interpretation with GO analysis is also included.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

Identification of Biomarkers for Diagnosis of Gastric Cancer by Bioinformatics

  • Wang, Da-Guang;Chen, Guang;Wen, Xiao-Yu;Wang, Dan;Cheng, Zhi-Hua;Sun, Si-Qiao
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.4
    • /
    • pp.1361-1365
    • /
    • 2015
  • Background: We aimed to discover potential gene biomarkers for gastric cancer (GC) diagnosis. Materials and Methods: Genechips of 10 GC tissues and 10 gastric mucosa (GM, para-carcinoma tissue, normal control) tissues were generated using an exon array of Affymetrix containing 30,000 genes. The differentially expressed genes (DEGs) between GC tissues and normal control were identified by the Limma package and analyzed by hierarchical clustering analysis. Gene ontology (GO) and pathway enrichment analyses were performed for investigating the functions of DEGs. Receiver operating characteristics (ROC) analysis was performed to measure the effects of biomarker candidates for diagnosis of GC. Results: Totals of 896 up-regulated and 60 down-regulated DEGs were identified to be differentially expressed between GC samples and normal control. Hierarchical clustering analysis showed that DEGs were highly differentially expressed and most DEGs were up-regulated. The most significantly enriched GO-BP term was revealed to be mitotic cell cycle and the most significantly enriched pathway was cell cycle. The intersection analysis showed that most significant DEGs were cyclin B1 (CCNB1) and cyclin B2 (CCNB2). The sensitivities and specificities of CCNB1 and CCNB2 were both high (p<0.0001). Areas under the ROC curve for CCNB1 and CCNB2 were both greater than 0.9 (p<0.0001). Conclusions: CCNB1 and CCNB2, which were involved in cell cycle, played significant roles in the progression and development of GC and these genes may be potential biomarkers for diagnosis and prognosis of GC.

Expression of Coat Color Associated Genes in Korean Brindle Cattle by Microarray Analysis

  • Lee, Hae-Lee;Park, Jae-Hee;Kim, Jong Gug
    • Journal of Embryo Transfer
    • /
    • v.30 no.2
    • /
    • pp.99-107
    • /
    • 2015
  • The aim of the present study was to identify coat color associated genes that are differentially expressed in mature Korean brindle cattle (KBC) with different coat colors and in Hanwoo cows. KBC calves, before and after coat color appearance, were included. Total cellular RNA was isolated from the tail hair cells and used for microarray. The number of expressed coat color associated genes/probes was 5813 in mature KBC and Hanwoo cows. Among the expressed coat color associated genes/probes, 167 genes were the coat color associated genes listed in the Gene card database and 125 genes were the pigment and melanocyte genes listed in the Gene ontology_bovine database. There were 23 genes/probes commonly listed in both databases and their expressions were further studied. Out of the 23 genes/probes, MLPH, PMEL, TYR and TYRP1 genes were expressed at least two fold higher (p<0.01) levels in KBC with brindle color than either Hanwoo or KBC with brown color. TYRP1 expression was 22.96 or 19.89 fold higher (p<0.01) in KBC with brindle color than either Hanwoo or KBC with brown color, respectively, which was the biggest fold difference. The hierarchical clustering analysis indicated that MLPH, PMEL, TYR and TYRP1 were the highly expressed genes in mature cattle. There were only a few genes differentially expressed after coat color appearance in KBC calves. Studies on the regulation and mechanism of gene expression of highly expressed genes would be next steps to better understand coat color determination and to improve brindle coat color appearance in KBC.

A Study of HME Model in Time-Course Microarray Data

  • Myoung, Sung-Min;Kim, Dong-Geon;Jo, Jin-Nam
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.415-422
    • /
    • 2012
  • For statistical microarray data analysis, clustering analysis is a useful exploratory technique and offers the promise of simultaneously studying the variation of many genes. However, most of the proposed clustering methods are not rigorously solved for a time-course microarray data cluster and for a fitting time covariate; therefore, a statistical method is needed to form a cluster and represent a linear trend of each cluster for each gene. In this research, we developed a modified hierarchical mixture of an experts model to suggest clustering data and characterize each cluster using a linear mixed effect model. The feasibility of the proposed method is illustrated by an application to the human fibroblast data suggested by Iyer et al. (1999).

In-silico inferences for expression data using IGAM: Applied to Fuzzy-Clustering & Regulatory Network Modeling (연판 지식을 이용한 유전자 발현 데이터 분석: 퍼지 플러스링과 조절 네트웍 모델링에의 응용)

  • Lee, Philhyone;Hojeong Nam;Lee, Doheon;Lee, Kwang H.
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.04a
    • /
    • pp.273-276
    • /
    • 2004
  • Genome-scale expression data provides us with valuable insights about organisms, but the biological validation of in-silico analysis is difficult and often controversial. Here we present a new approach for integrating previously established knowledge with computational analysis. Based on the known biological evidences, IGAM (Integrated Gene Association Matrix) automatically estimates the relatedness between a pair of genes. We combined this association knowledge to the regulatory network modeling and fuzzy clustering in yeast 5. Cerevisiae. The result was found to be more effective for extracting biological meanings from in-silico inferences for gene expression data.

  • PDF

Gene Expression Analysis of Hepatic Response Induced by Gentamicin in Mice

  • Oh, Jung-Hwa;Park, Han-Jin;Hwang, Ji-Yoon;Jeong, Sun-Young;Lim, Jung-Sun;Kim, Yong-Bum;Yoon, Seok-Joo
    • Molecular & Cellular Toxicology
    • /
    • v.3 no.1
    • /
    • pp.60-67
    • /
    • 2007
  • Gentamicin is a broad-spectrum aminoglycoside antibiotic used in the treatment of bacterial infection. Although side effects of gentamicin such as nephrotoxicity and ototoxicity have been investigated, the information on the hepatic effects of gentamicin is still limited. In the present study, gene expression profiles were analyzed in the liver of gentamicin treated mice using Affymetrix GeneChip$^{(R)}$ Mouse Expression 430A 2.0 Array. Totally, 400 genes were identified as being either up- or down-regulated over 1.5-fold changes (P<0.01) in the liver of gentamicin treated mice. Among these deregulated genes, 16 up-regulated genes mainly involved in transport (Kif5b, Pex14, Rab14, Clcn3, and Necap1) and 20 down-regulated genes involved in lipid and other metabolisms (Hdlbp, Gm2a, Uroc1, and Dak) were selected using k-means clustering algorithm. The functional classification of differentially expressed genes represented that several stress-related genes were regulated in the liver by gentamicin treatment. This data may contribute in understanding the molecular mechanism in the liver of gentamicin treated mice.