• Title/Summary/Keyword: Self-organizing Maps

Search Result 97, Processing Time 0.021 seconds

Application of Multispectral Remotely Sensed Imagery for the Characterization of Complex Coastal Wetland Ecosystems of southern India: A Special Emphasis on Comparing Soft and Hard Classification Methods

  • Shanmugam, Palanisamy;Ahn, Yu-Hwan;Sanjeevi , Shanmugam
    • Korean Journal of Remote Sensing
    • /
    • v.21 no.3
    • /
    • pp.189-211
    • /
    • 2005
  • This paper makes an effort to compare the recently evolved soft classification method based on Linear Spectral Mixture Modeling (LSMM) with the traditional hard classification methods based on Iterative Self-Organizing Data Analysis (ISODATA) and Maximum Likelihood Classification (MLC) algorithms in order to achieve appropriate results for mapping, monitoring and preserving valuable coastal wetland ecosystems of southern India using Indian Remote Sensing Satellite (IRS) 1C/1D LISS-III and Landsat-5 Thematic Mapper image data. ISODATA and MLC methods were attempted on these satellite image data to produce maps of 5, 10, 15 and 20 wetland classes for each of three contrast coastal wetland sites, Pitchavaram, Vedaranniyam and Rameswaram. The accuracy of the derived classes was assessed with the simplest descriptive statistic technique called overall accuracy and a discrete multivariate technique called KAPPA accuracy. ISODATA classification resulted in maps with poor accuracy compared to MLC classification that produced maps with improved accuracy. However, there was a systematic decrease in overall accuracy and KAPPA accuracy, when more number of classes was derived from IRS-1C/1D and Landsat-5 TM imagery by ISODATA and MLC. There were two principal factors for the decreased classification accuracy, namely spectral overlapping/confusion and inadequate spatial resolution of the sensors. Compared to the former, the limited instantaneous field of view (IFOV) of these sensors caused occurrence of number of mixture pixels (mixels) in the image and its effect on the classification process was a major problem to deriving accurate wetland cover types, in spite of the increasing spatial resolution of new generation Earth Observation Sensors (EOS). In order to improve the classification accuracy, a soft classification method based on Linear Spectral Mixture Modeling (LSMM) was described to calculate the spectral mixture and classify IRS-1C/1D LISS-III and Landsat-5 TM Imagery. This method considered number of reflectance end-members that form the scene spectra, followed by the determination of their nature and finally the decomposition of the spectra into their endmembers. To evaluate the LSMM areal estimates, resulted fractional end-members were compared with normalized difference vegetation index (NDVI), ground truth data, as well as those estimates derived from the traditional hard classifier (MLC). The findings revealed that NDVI values and vegetation fractions were positively correlated ($r^2$= 0.96, 0.95 and 0.92 for Rameswaram, Vedaranniyam and Pitchavaram respectively) and NDVI and soil fraction values were negatively correlated ($r^2$ =0.53, 0.39 and 0.13), indicating the reliability of the sub-pixel classification. Comparing with ground truth data, the precision of LSMM for deriving moisture fraction was 92% and 96% for soil fraction. The LSMM in general would seem well suited to locating small wetland habitats which occurred as sub-pixel inclusions, and to representing continuous gradations between different habitat types.

Input Pattern Vector Extraction and Pattern Recognition of Taste using fMRI (fMRI를 이용한 맛의 입력패턴벡터 추출 및 패턴인식)

  • Lee, Sun-Yeob;Lee, Yong-Gu;Kim, Dong-Ki
    • Journal of radiological science and technology
    • /
    • v.30 no.4
    • /
    • pp.419-426
    • /
    • 2007
  • In this paper, the input pattern vectors are extracted and the learning algorithms is designed to recognize taste(bitter, sweet, sour and salty) pattern vectors. The signal intensity of taste are used to compose the input pattern vectors. The SOM(Self Organizing Maps) algorithm for taste pattern recognition is used to learn initial reference vectors and the ot-star learning algorithm is used to determine the class of the output neurons of the sunclass layer. The weights of the proposed algorithm which is between the input layer and the subclass layer can be learned to determine initial reference vectors by using SOM algorithm and to learn reference vectors by using LVQ(Learning Vector Quantization) algorithm. The pattern vectors are classified into subclasses by neurons in the subclass layer, and the weights between subclass layer and output layer are learned to classify the classified subclass, which is enclosed a class. To classify the pattern vectors, the proposed algorithm is simulated with ones of the conventional LVQ, and it is confirmed that the proposed learning method is more successful classification than the conventional LVQ.

  • PDF

Fast VQ Codebook Design by Sucessively Bisectioning of Principle Axis (주축의 연속적 분할을 통한 고속 벡터 양자화 코드북 설계)

  • Kang, Dae-Seong;Seo, Seok-Bae;Kim, Dai-Jin
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.422-431
    • /
    • 2000
  • This paper proposes a new codebook generation method, called a PCA-Based VQ, that incorporates the PCA (Principal Component Analysis) technique into VQ (Vector Quantization) codebook design. The PCA technique reduces the data dimensions by transforming input image vectors into the feature vectors. The cluster of feature vectors in the transformed domain is bisectioned into two subclusters by an optimally chosen partitioning hyperplane. We expedite the searching of the optimal partitioning hyperplane that is the most time consuming process by considering that (1) the optimal partitioning hyperplane is perpendicular to the first principal axis of the feature vectors, (2) it is located on the equilibrium point of the left and right cluster's distortions, and (3) the left and right cluster's distortions can be adjusted incrementally. This principal axis bisectioning is successively performed on the cluster whose difference of distortion between before and after bisection is the maximum among the existing clusters until the total distortion of clusters becomes as small as the desired level. Simulation results show that the proposed PCA-based VQ method is promising because its reconstruction performance is as good as that of the SOFM (Self-Organizing Feature Maps) method and its codebook generation is as fast as that of the K-means method.

  • PDF

Characteristics of Trend and Pattern for Water Quality Monitoring Networks Data using Seasonal-kendall, SOM and RDA on the Mulgeum in the Nakdong River (경향성 및 패턴 분석을 이용한 낙동강 물금지역의 수질 특성)

  • Ahn, Jung-Min;Lee, In-Jung;Jung, Kang-Young;Kim, Jueon;Lee, Kwonchul;Cheon, Seuk;Lyu, Siwan
    • Journal of Environmental Science International
    • /
    • v.25 no.3
    • /
    • pp.361-371
    • /
    • 2016
  • Ministry of Environment has been operating water quality monitoring network in order to obtain the basic data for the water environment policies and comprehensively understand the water quality status of public water bodies such as rivers and lakes. The observed water quality data is very important to analyze by applying statistical methods because there are seasonal fluctuations. Typically, monthly water quality data has to analyze that the transition comprise a periodicity since the change has the periodicity according to the change of seasons. In this study, trends, SOM and RDA analysis were performed at the Mulgeum station using water quality data for temperature, BOD, COD, pH, SS, T-N, T-P, Chl-a and Colon-bacterium observed from 1989 to 2013 in the Nakdong River. As a result of trends, SOM and RDA, the Mulgeum station was found that the water quality is improved, but caution is required in order to ensure safe water supply because concentrations in water quality were higher in the early spring(1~3 month) the most.

Nonstandard Machine Learning Algorithms for Microarray Data Mining

  • Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.10a
    • /
    • pp.165-196
    • /
    • 2001
  • DNA chip 또는 microarray는 다수의 유전자 또는 유전자 조각을 (보통 수천내지 수만 개)칩상에 고정시켜 놓고 DNA hybridization 반응을 이용하여 유전자들의 발현 양상을 분석할 수 있는 기술이다. 이러한 high-throughput기술은 예전에는 생각하지 못했던 여러가지 분자생물학의 문제에 대한 해답을 제시해 줄 수 있을 뿐 만 아니라, 분자수준에서의 질병 진단, 신약 개발, 환경 오염 문제의 해결 등 그 응용 가능성이 무한하다. 이 기술의 실용적인 적용을 위해서는 DNA chip을 제작하기 위한 하드웨어/웻웨어 기술 외에도 이러한 데이터로부터 최대한 유용하고 새로운 지식을 창출하기 위한 bioinformatics 기술이 핵심이라고 할 수 있다. 유전자 발현 패턴을 데이터마이닝하는 문제는 크게 clustering, classification, dependency analysis로 구분할 수 있으며 이러한 기술은 통계학과인공지능 기계학습에 기반을 두고 있다. 주로 사용된 기법으로는 principal component analysis, hierarchical clustering, k-means, self-organizing maps, decision trees, multilayer perceptron neural networks, association rules 등이다. 본 세미나에서는 이러한 기본적인 기계학습 기술 외에 최근에 연구되고 있는 새로운 학습 기술로서 probabilistic graphical model (PGM)을 소개하고 이를 DNA chip 데이터 분석에 응용하는 연구를 살펴본다. PGM은 인공신경망, 그래프 이론, 확률 이론이 결합되어 형성된 기계학습 모델로서 인간 두뇌의 기억과 학습 기작에 기반을 두고 있으며 다른 기계학습 모델과의 큰 차이점 중의 하나는 generative model이라는 것이다. 즉 일단 모델이 만들어지면 이것으로부터 새로운 데이터를 생성할 수 있는 능력이 있어서, 만들어진 모델을 검증하고 이로부터 새로운 사실을 추론해 낼 수 있어 biological data mining 문제에서와 같이 새로운 지식을 발견하는 exploratory analysis에 적합하다. 또한probabilistic graphical model은 기존의 신경망 모델과는 달리 deterministic한의사결정이 아니라 확률에 기반한 soft inference를 하고 학습된 모델로부터 관련된 요인들간의 인과관계(causal relationship) 또는 상호의존관계(dependency)를 분석하기에 적합한 장점이 있다. 군체적인 PGM 모델의 예로서, Bayesian network, nonnegative matrix factorization (NMF), generative topographic mapping (GTM)의 구조와 학습 및 추론알고리즘을소개하고 이를 DNA칩 데이터 분석 평가 대회인 CAMDA-2000과 CAMDA-2001에서 사용된cancer diagnosis 문제와 gene-drug dependency analysis 문제에 적용한 결과를 살펴본다.

  • PDF

Comparison of clustering methods of microarray gene expression data (마이크로어레이 유전자 발현 자료에 대한 군집 방법 비교)

  • Lim, Jin-Soo;Lim, Dong-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.39-51
    • /
    • 2012
  • Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.

Identification of the Marker-Genes for Dioxin(2, 3, 7, 8- tetradibenzo-p-dioxin)-Induced Immune Dysfunction by Using the High-Density Oligonucleotide Microarray

  • Kim, Jeong-Ah;Lee, Eun-Ju;Chung, In Hye;Kim, Hyung-Lae
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.75-80
    • /
    • 2004
  • In a variety of animal species, the perinatal exposure of experimental animals to the 2,3,7,8-tetrachlorodibenzo­p-dioxin (TCDD) leads to the immune dysfunction, which is more severe and persistent than that caused by adult exposure. We report here the changes of gene expression and the identification of the marker-genes representing the dioxin exposure. The expressions of the transcripts were analyzed using the 11 K oligonucleotide­microarray from the bone marrow cells of male C57BL/6J mice after an intraperitoneal injection of $1{\mu}g$ TCDD/kg body weight at various time intervals: gestational 6.5 day(G6.5), 13.5 day(G13.5), 18.5 day(G18.5), and postnatal 3 (P3W)and 6 week (P6W). The type of self-organizing maps(SOM) representing the specific exposure dioxin could be identified as follows; G6.5D(C14), G13.5D(C0, C5, C10, C18), G18.5D(7): P3W(C2, C21), and P6W(C4, C15, C20). The candidate marker-genes were restricted to the transcripts, which could be consistently expressed greater than $\pm$2-fold in three experiments. The resulting candidates were 85 genes, the characteristics of that were involved in cell physiology and cell functions such as cell proliferation and immune function. We identified the biomarker-genes for dioxin exposure: smc -like 2 from SOM C14 for the dioxin exposure at G6.5D, focal adhesion kinase and 6 other genes from C0, and protein tyrosine phosphatase 4a2 and 3 other genes from C5 for G13.5D, platelet factor 4 from C7 for G18.5D, fos from C2 for P3W.