• Title/Summary/Keyword: Clustering genes

Search Result 141, Processing Time 0.023 seconds

Feature-based Gene Classification and Region Clustering using Gene Expression Grid Data in Mouse Hippocampal Region (쥐 해마의 유전자 발현 그리드 데이터를 이용한 특징기반 유전자 분류 및 영역 군집화)

  • Kang, Mi-Sun;Kim, HyeRyun;Lee, Sukchan;Kim, Myoung-Hee
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.54-60
    • /
    • 2016
  • Brain gene expression information is closely related to the structural and functional characteristics of the brain. Thus, extensive research has been carried out on the relationship between gene expression patterns and the brain's structural organization. In this study, Principal Component Analysis was used to extract features of gene expression patterns, and genes were automatically classified by spatial distribution. Voxels were then clustered with classified specific region expressed genes. Finally, we visualized the clustering results for mouse hippocampal region gene expression with the Allen Brain Atlas. This experiment allowed us to classify the region-specific gene expression of the mouse hippocampal region and provided visualization of clustering results and a brain atlas in an integrated manner. This study has the potential to allow neuroscientists to search for experimental groups of genes more quickly and design an effective test according to the new form of data. It is also expected that it will enable the discovery of a more specific sub-region beyond the current known anatomical regions of the brain.

A hybrid method to compose an optimal gene set for multi-class classification using mRMR and modified particle swarm optimization (mRMR과 수정된 입자군집화 방법을 이용한 다범주 분류를 위한 최적유전자집단 구성)

  • Lee, Sunho
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.6
    • /
    • pp.683-696
    • /
    • 2020
  • The aim of this research is to find an optimal gene set that provides highly accurate multi-class classification with a minimum number of genes. A two-stage procedure is proposed: Based on minimum redundancy and maximum relevance (mRMR) framework, several statistics to rank differential expression genes and K-means clustering to reduce redundancy between genes are used for data filtering procedure. And a particle swarm optimization is modified to select a small subset of informative genes. Two well known multi-class microarray data sets, ALL and SRBCT, are analyzed to indicate the effectiveness of this hybrid method.

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

Fuzzy Cluster Analysis of Gene Expression Profiles Using Evolutionary Computation and Adaptive ${\alpha}$-cut based Evaluation (진화연산과 적응적 ${\alpha}$-cut 기반 평가를 이용한 유전자 발현 데이타의 퍼지 클러스터 분석)

  • Park Han-Saem;Cho Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.8
    • /
    • pp.681-691
    • /
    • 2006
  • Clustering is one of widely used methods for grouping thousands of genes by their similarities of expression levels, so that it helps to analyze gene expression profiles. This method has been used for identifying the functions of genes. Fuzzy clustering method, which is one category of clustering, assigns one sample to multiple groups according to their degrees of membership. This method is more appropriate for analyzing gene expression profiles because single gene might involve multiple genetic functions. Clustering methods, however, have the problems that they are sensitive to initialization and can be trapped into local optima. To solve these problems, this paper proposes an evolutionary fuzzy clustering method, where adaptive a-cut based evaluation is used for the fitness evaluation to apply different criteria considering the characteristics of datasets to overcome the limitation of Bayesian validation method that applies the same criterion to all datasets. We have conducted experiments with SRBCT and yeast cell-cycle datasets and analyzed the results to confirm the usefulness of the proposed method.

Finding significant genes using factor analysis (요인 분석을 이용한 유의한 유전자 추출)

  • Lee, Jeong-Wha;Lee, Hye-Seon;Park, Hae-Sang;Jun, Chi-Hyuck
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2006.11a
    • /
    • pp.427-430
    • /
    • 2006
  • Clustering for gene expression data without filtering out noise genes may be distorted or derived inappropriate inference. Identifying significant genes and deleting noise before major analysis is necessary fur meaningful discovery from genes expression pattern. We proposed a new method of finding significant genes using factor analysis which is done on transposed data matrix. We construct significance score that is sum of factor loadings for declared significant number of factor, and set threshold through replication. Our proposed method works well for simulated time-course data for finding significant genes even though variance level gets larger.

  • PDF

A comparison study of classification method based of SVM and data depth in microarray data (마이크로어레이 자료에서 서포트벡터머신과 데이터 뎁스를 이용한 분류방법의 비교연구)

  • Hwang, Jin-Soo;Kim, Jee-Yun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.311-319
    • /
    • 2009
  • A robust L1 data depth was used in clustering and classification, so called DDclus and DDclass by Jornsten (2004). SVM-based classification works well in most of the situation but show some weakness in the presence of outliers. Proper gene selection is important in classification since there are so many redundant genes. Either by selecting appropriate genes or by gene clustering combined with classification method enhance the overall performance of classification. The performance of depth based method are evaluated among several SVM-based classification methods.

  • PDF

A Review of Cluster Analysis for Time Course Microarray Data (시간 경로 마이크로어레이 자료의 군집 분석에 관한 고찰)

  • Sohn In-Suk;Lee Jae-Won;Kim Seo-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.13-32
    • /
    • 2006
  • Biologists are attempting to group genes based on the temporal pattern of gene expression levels. So far, a number of methods have been proposed for clustering microarray data. However, the results of clustering depends on the genes selection, therefore the gene selection with significant expression difference is also very important to cluster for microarray data. Thus, this paper present the results of broad comparative studies to time course microarray data by considering methods of gene selection, clustering and cluster validation.

Modeling Large S-System using Clustering and Genetic Algorithm

  • Jung, Sung-Won;Lee, Kwang-H.;Lee, Co-Heon
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.197-201
    • /
    • 2005
  • When we want to find out the regulatory relationships between genes from gene expression data, dimensionality is one of the big problem. In general, the size of search space in modeling the regulatory relationships grows in O(n$^2$) while the number of genes is increasing. However, hopefully it can be reduced to O(kn) with selected k by applying divide and conquer heuristics which depend on some assumptions about genetic network. In this paper, we approach the modeling problem in divide-and-conquer manner. We applied clustering to make the problem into small sub-problems, then hierarchical model process is applied to those small sub-problems.

  • PDF

Mitochondrial DNA-based investigation of dead rorqual (Cetacea: Balaenopteridae) from the west coast of India

  • Shantanu Kundu;Manokaran Kamalakannan;Dhriti Banerjee;Flandrianto Sih Palimirmo;Arif Wibowo;Hyun-Woo Kim
    • Fisheries and Aquatic Sciences
    • /
    • v.27 no.1
    • /
    • pp.48-55
    • /
    • 2024
  • The study assessed the utility of mitochondrial DNA for identifying a deceased rorqual discovered off the western coast of India. Both the COI and Cytb genes exhibited remarkable 99-100% similarity with the GenBank sequence of Balaenoptera musculus through a global BLAST search, confirming their affiliation with this species. Inter-species genetic distances for COI and Cytb genes ranged from 6.75% to 9.80% and 7.37% to 10.96% respectively, compared with other Balaenopteridae species. The Bayesian phylogenies constructed based on both COI and Cytb genes demonstrated clear and separate clustering for all Balaenopteridae species, further reaffirming their distinctiveness, while concurrently revealing a cohesive clustering pattern of the generated sequences within the B. musculus clade. Beyond species confirmation, this study provides valuable insights into the presence of live and deceased B. musculus individuals within Indian marine ecosystems. This information holds significant potential for guiding conservation efforts aimed at safeguarding Important Marine Mammal Areas (IMMAs) in India over the long term.

Genomic Insights into the Rice Blast Fungus through Estimation of Gene Emergence Time in Phylogenetic Context

  • Choi, Jaeyoung;Lee, Jong-Joon;Jeon, Junhyun
    • Mycobiology
    • /
    • v.46 no.4
    • /
    • pp.361-369
    • /
    • 2018
  • The rice blast fungus, Magnaporthe oryzae, is an important pathogen of rice plants. It is well known that genes encoded in the genome have different evolutionary histories that are related to their functions. Phylostratigraphy is a method that correlates the evolutionary origin of genes with evolutionary transitions. Here we applied phylostratigraphy to partition total gene content of M. oryzae into distinct classes (phylostrata), which we designated PS1 to PS7, based on estimation of their emergence time. Genes in individual phylostrata did not show significant biases in their global distribution among seven chromosomes, but at the local level, clustering of genes belonging to the same phylostratum was observed. Our phylostrata-wide analysis of genes revealed that genes in the same phylostratum tend to be similar in many physical and functional characteristics such as gene length and structure, GC contents, codon adaptation index, and level of transcription, which correlates with biological functions in evolutionary context. We also found that a significant proportion of genes in the genome are orphans, for which no orthologs can be detected in the database. Among them, we narrowed down to seven orphan genes having transcriptional and translational evidences, and showed that one of them is implicated in asexual reproduction and virulence, suggesting ongoing evolution in this fungus through lineage-specific genes. Our results provide genomic basis for linking functions of pathogenicity factors and gene emergence time.