• Title/Summary/Keyword: Hierarchical clustering analysis (HCA)

Search Result 12, Processing Time 0.026 seconds

Hierarchical Cluster Analysis Histogram Thresholding with Local Minima

  • Sengee, Nyamlkhagva;Radnaabazar, Chinzorig;Batsuuri, Suvdaa;Tsedendamba, Khurel-Ochir;Telue, Berekjan
    • Journal of Multimedia Information System
    • /
    • v.4 no.4
    • /
    • pp.189-194
    • /
    • 2017
  • In this study, we propose a method which is based on "Image segmentation by histogram thresholding using hierarchical cluster analysis"/HCA/ and "A nonparametric approach for histogram segmentation"/NHS/. HCA method uses that all histogram bins are one cluster then it reduces cluster numbers by using distance metric. Because this method has too many clusters, it is more computation. In order to eliminate disadvantages of "HCA" method, we used "NHS" method. NHS method finds all local minima of histogram. To reduce cluster number, we use NHS method which is fast. In our approach, we combine those two methods to eliminate disadvantages of Arifin method. The proposed method is not only less computational than "HCA" method because combined method has few clusters but also it uses local minima of histogram which is computed by "NHS".

Quality Assessment of Curcuma longa L. by Gas Chromatography-Mass Spectrometry Fingerprint, Principle Components Analysis and Hierarchical Clustering Analysis

  • Li, Ming;Zhou, Xin;Zhao, Yang;Wang, Dao-Ping;Hu, Xiao-Na
    • Bulletin of the Korean Chemical Society
    • /
    • v.30 no.10
    • /
    • pp.2287-2293
    • /
    • 2009
  • Gas Chromatography-Mass Spectrometry (GC-MS) fingerprint analysis, Principle Components Analysis (PCA), and Hierarchical Cluster Analysis (HCA) were introduced for quality assessment of Curcuma longa L. (C. longa). The GC-MS fingerprint method was developed and validated by analyzing 33 batches of samples of C. longa from different geographic locations. 18 chromatographic peaks were selected as characteristic peaks and their relative peak areas (RPA) were calculated for quantitative expression. Two principal components (PCs) were extracted by PCA. C. longa collected from Guizhou and Fujian were separated from other samples by PC1, capturing 71.83% of variance. While, PC2 contributed for their further separation, capturing 11.13% of variance. HCA confirmed the result of PCA analysis. Therefore, GC-MS fingerprint study with chemometric techniques provides a very flexible and reliable method for quality assessment of C. longa.

Application of Clustering Methods for Interpretation of Petroleum Spectra from Negative-Mode ESI FT-ICR MS

  • Yeo, In-Joon;Lee, Jae-Won;Kim, Sung-Hwan
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.11
    • /
    • pp.3151-3155
    • /
    • 2010
  • This study was performed to develop analytical methods to better understand the properties and reactivity of petroleum, which is a highly complex organic mixture, using high-resolution mass spectrometry and statistical analysis. Ten crude oil samples were analyzed using negative-mode electrospray ionization Fourier transform ion cyclotron resonance mass spectrometry (ESI FT-ICR MS). Clustering methods, including principle component analysis (PCA), hierarchical clustering analysis (HCA), and k-means clustering, were used to comparatively interpret the spectra. All the methods were consistent and showed that oxygen and sulfur-containing heteroatom species played important roles in clustering samples or peaks. The oxygen-containing samples had higher acidity than the other samples, and the clustering results were linked to properties of the crude oils. This study demonstrated that clustering methods provide a simple and effective way to interpret complex petroleomic data.

Impurity profiling and chemometric analysis of methamphetamine seizures in Korea

  • Shin, Dong Won;Ko, Beom Jun;Cheong, Jae Chul;Lee, Wonho;Kim, Suhkmann;Kim, Jin Young
    • Analytical Science and Technology
    • /
    • v.33 no.2
    • /
    • pp.98-107
    • /
    • 2020
  • Methamphetamine (MA) is currently the most abused illicit drug in Korea. MA is produced by chemical synthesis, and the final target drug that is produced contains small amounts of the precursor chemicals, intermediates, and by-products. To identify and quantify these trace compounds in MA seizures, a practical and feasible approach for conducting chromatographic fingerprinting with a suite of traditional chemometric methods and recently introduced machine learning approaches was examined. This was achieved using gas chromatography (GC) coupled with a flame ionization detector (FID) and mass spectrometry (MS). Following appropriate examination of all the peaks in 71 samples, 166 impurities were selected as the characteristic components. Unsupervised (principal component analysis (PCA), hierarchical cluster analysis (HCA), and K-means clustering) and supervised (partial least squares-discriminant analysis (PLS-DA), orthogonal partial least squares-discriminant analysis (OPLS-DA), support vector machines (SVM), and deep neural network (DNN) with Keras) chemometric techniques were employed for classifying the 71 MA seizures. The results of the PCA, HCA, K-means clustering, PLS-DA, OPLS-DA, SVM, and DNN methods for quality evaluation were in good agreement. However, the tested MA seizures possessed distinct features, such as chirality, cutting agents, and boiling points. The study indicated that the established qualitative and semi-quantitative methods will be practical and useful analytical tools for characterizing trace compounds in illicit MA seizures. Moreover, they will provide a statistical basis for identifying the synthesis route, sources of supply, trafficking routes, and connections between seizures, which will support drug law enforcement agencies in their effort to eliminate organized MA crime.

Establishment of discrimination system using multivariate analysis of FT-IR spectroscopy data from different species of artichoke (Cynara cardunculus var. scolymus L.) (FT-IR 스펙트럼 데이터 기반 다변량통계분석기법을 이용한 아티초크의 대사체 수준 품종 분류)

  • Kim, Chun Hwan;Seong, Ki-Cheol;Jung, Young Bin;Lim, Chan Kyu;Moon, Doo Gyung;Song, Seung Yeob
    • Horticultural Science & Technology
    • /
    • v.34 no.2
    • /
    • pp.324-330
    • /
    • 2016
  • To determine whether FT-IR spectral analysis based on multivariate analysis for whole cell extracts can be used to discriminate between artichoke (Cynara cardunculus var. scolymus L.) plants at the metabolic level, leaves of ten artichoke plants were subjected to Fourier transform infrared(FT-IR) spectroscopy. FT-IR spectral data from leaves were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). FT-IR spectra confirmed typical spectral differences between the frequency regions of 1,700-1,500, 1,500-1,300 and $1,100-950cm^{-1}$, respectively. These spectral regions reflect the quantitative and qualitative variations of amide I, II from amino acids and proteins ($1,700-1,500cm^{-1}$), phosphodiester groups from nucleic acid and phospholipid ($1,500-1,300cm^{-1}$) and carbohydrate compounds ($1,100-950cm^{-1}$). PCA revealed separate clusters that corresponded to their species relationship. Thus, PCA could be used to distinguish between artichoke species with different metabolite contents. PLS-DA showed similar species classification of artichoke. Furthermore these metabolic discrimination systems could be used for the rapid selection and classification of useful artichoke cultivars.

Metabolic Discrimination of Papaya (Carica papaya L.) Leaves Depending on Growth Temperature Using Multivariate Analysis of FT-IR Spectroscopy Data (FT-IR 스펙트럼 다변량통계분석을 이용한 파파야(Carica papaya L.)의 생육온도 변화에 따른 대사체 수준 식별)

  • Jung, Young Bin;Kim, Chun Hwan;Lim, Chan Kyu;Kim, Sung Chel;Song, Kwan Jeong;Song, Seung Yeob
    • Journal of the Korean Society of International Agriculture
    • /
    • v.31 no.4
    • /
    • pp.378-383
    • /
    • 2019
  • To determine whether FT-IR spectral analysis based on multivariate analysis for whole cell extracts can be used to discriminate papaya at metabolic level. FT-IR spectral data from leaves were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). FT-IR spectra confirmed typical spectral differences between the frequency regions of 1,700-1,500, 1,500-1,300 and 1,100-950 cm-1, respectively. These spectral regions were reflecting the quantitative and qualitative variations of amide I, II from amino acids and proteins (1,700-1,500 cm-1), phosphodiester groups from nucleic acid and phospholipid (1,500-1,300 cm-1) and carbohydrate compounds (1,100-950 cm-1). The result of PCA analysis showed that papaya leaves could be separated into clusters depending on different growth temperature. In this case, showed discrimination confirmed according to metabolite content of growth condition from papaya. And PLS-DA analysis also showed more clear discrimination pattern than PCA result. Furthermore, these metabolic discrimination systems could be applied for rapid selection and classification of useful papaya cultivars.

Rapid discrimination system of Chinese cabbage (Brassica rapa) at metabolic level using Fourier transform infrared spectroscopy (FT-IR) based on multivariate analysis (배추 대사체 추출물의 FT-IR 스펙트럼 및 다변량 통계분석을 통한 계통 신속 식별 체계)

  • Ahn, Myung Suk;Lim, Chan Ju;Song, Seung Yeob;Min, Sung Ran;Lee, In Ho;Nou, Ill-Sup;Kim, Suk Weon
    • Journal of Plant Biotechnology
    • /
    • v.43 no.3
    • /
    • pp.383-390
    • /
    • 2016
  • To determine whether FT-IR spectral analysis based on multivariate analysis could be used to discriminate Chinese cabbage breeding line at metabolic level, whole cell extracts of nine different breeding lines (three paternal, three maternal and three $F_1$ lines) were subjected to Fourier transform infrared spectroscopy (FT-IR). FT-IR spectral data of Chinese cabbage plants were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA), and hierarchical clustering analysis (HCA). The hierarchical dendrograms based on PLS-DA from two of three cross combinations showed that paternal, maternal, and their progeny $F_1$ lines samples were perfectly separated into three branches in breeding line dependent manner. However, a cross combination failed to fully discriminate them into three branches. Thus, hierarchical dendrograms based on PLS-DA of FT-IR spectral data of Chinese cabbage breeding lines could be used to represent the most probable chemotaxonomical relationship among maternal, paternal, and $F_1$ plants. Furthermore, these metabolic discrimination systems could be applied for rapid selection and classification of useful Chinese cabbage cultivars.

Establishment of rapid discrimination system of leguminous plants at metabolic level using FT-IR spectroscopy with multivariate analysis (FT-IR 스펙트럼 기반 다변량통계분석기법에 의한 두과작물의 대사체 수준 식별체계 확립)

  • Song, Seung-Yeob;Ha, Tae-Joung;Jang, Ki-Chang;Kim, In-Jung;Kim, Suk-Weon
    • Journal of Plant Biotechnology
    • /
    • v.39 no.3
    • /
    • pp.121-126
    • /
    • 2012
  • To determine whether FT-IR spectroscopy combined with multivariate analysis for whole cell extracts can be used to discriminate major leguminous plant at metabolic level, seed extracts of six leguminous plants were subjected to Fourier transform infrared spectroscopy (FT-IR). FT-IR spectral data from seed extracts were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). The PCA could not fully discriminate six leguminous plants, however PLS-DA could successfully discriminate six leguminous plants. The hierarchical dendrogram based on PLS-DA separated the six leguminous plants into four branches. The first branch was consisted of all three Vigna species including Vigna radiata var. radiate, Vigna angularis var. angularis and Vigna unguiculata subsp. Unguiculata. Whereas Pisum sativum var. sativum, Glycine max L and Phaseolus vulgaris var. vulgaris were clustered into a separate branch respectively. The overall results showed that metabolic discrimination system were in accordance with known phylogenic taxonomy. Thus we suggested that the hierarchical dendrogram based on PLS-DA of FT-IR spectral data from seed extracts represented the most probable chemotaxonomical relationship between six leguminous plants.

Analysis of Species Assemblages Caught by Large-pair-Trawler in Korean Waters (한국 근해 쌍끌이대형저인망어업의 어종군집 분석)

  • Lee, Dong-Woo;Lee, Jae-Bong;Kim, Yeong-Hye;Kang, Su-Kyung
    • Korean Journal of Fisheries and Aquatic Sciences
    • /
    • v.44 no.5
    • /
    • pp.499-505
    • /
    • 2011
  • The fishing grounds of Korean large-pair trawlers have shifted since exclusive economic zones (EEZs) were established in a fisheries agreement involving countries neighboring Korean waters. The distributions of marine ecosystems and fisheries resources have been changing with environmental changes such as global warming and with the shift in species targeted as a result of changes in fishing technology and fishing gear. This study analyzed variation in the species assemblages caught in Korean waters by large-pair trawlers as a result of these geopolitical and environmental changes. The data used in this study were obtained from the Fishery Production Statistics of Korea and the Port Sample Survey of the National Fisheries Research and Development Institute (NFRDI) from 1990 to 2007. Hierarchical clustering analysis (HCA) and correspondence analysis (CA) were used to explore the characteristics of the catch-species composition. The overall variation in the species composition of the catch of Korean large-pair trawlers showed that the proportions of croaker Johnius grypotus, small yellow croaker Larimichthys polyactis, eel Anguilla japonica, and blue crab Portunus trituberculatus decreased, whereas those of hairtail Trichiurus lepturus, Spanish mackerel Scomberomorus niphonius, anchovy Engraulis japonicus, and common squid Todarodes pacificus increased in Korean waters over the 18-year period. The results of the HCA of the annual catch data by species showed four different distributions of fish species according to year. Results of the CA showed that the species assemblages differed between the 1990s and 2000s.

Comparison of 12 Isoflavone Profiles of Soybean (Glycine max (L.) Merrill) Seed Sprouts from Three Different Countries

  • Park, Soo-Yun;Kim, Jae Kwang;Kim, Eun-Hye;Kim, Seung-Hyun;Prabakaran, Mayakrishnan;Chung, Ill-Min
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.63 no.4
    • /
    • pp.360-377
    • /
    • 2018
  • The levels of 12 isoflavones were measured in soybean (Glycine max (L.) Merrill) sprouts of 68 genetic varieties from three countries (China, Japan, and Korea). The isoflavone profile differences were analyzed using data mining methods. A principal component analysis (PCA) revealed that the CSRV021 variety was separated from the others by the first two principal components. This variety appears to be most suited for functional food production due to its high isoflavone levels. Partial least squares discriminant analysis (PLS-DA) and orthogonal projections to latent structures discriminant analysis (OPLS-DA) showed that there are meaningful isoflavone compositional differences in samples that have different countries of origin. Hierarchical clustering analysis (HCA) of these phytochemicals resulted in clusters derived from closely related biochemical pathways. These results indicate the usefulness of metabolite profiling combined with chemometrics as a tool for assessing the quality of foods and identifying metabolic links in biological systems.