• Title/Summary/Keyword: Partial least square discriminant analysis

Search Result 45, Processing Time 0.03 seconds

Discrimination of Alismatis Rhizoma According to Geographical Origins using Near Infrared Spectroscopy (근적외선분광법을 이용한 택사의 산지 판별법 연구)

  • Lee, Dong Young;Kim, Seung Hyun;Kim, Hyo Jin;Sung, Sang Hyun
    • Korean Journal of Pharmacognosy
    • /
    • v.44 no.4
    • /
    • pp.344-349
    • /
    • 2013
  • Near infrared spectroscopy (NIRS) combined with multivariate analysis was used to discriminate the geographical origin of Alisma orientale from Korea (n=94) and China (n=72). Two-thirds of samples were selected randomly for the training set, and one-third of samples for the test set. Second derivative was used for the pretreatment of NIR spectra. Partial least square discriminant analysis (PLS-DA) models correctly discriminated 100% of the Korean and Chinese A. orientale samples. These results demonstrate the potential use of NIR spectroscopy combined with multivariate analysis as a rapid and accurate method to discriminate A. orientale according to their geographical origin.

Hyperspectral Imaging and Partial Least Square Discriminant Analysis for Geographical Origin Discrimination of White Rice

  • Mo, Changyeun;Lim, Jongguk;Kwon, Sung Won;Lim, Dong Kyu;Kim, Moon S.;Kim, Giyoung;Kang, Jungsook;Kwon, Kyung-Do;Cho, Byoung-Kwan
    • Journal of Biosystems Engineering
    • /
    • v.42 no.4
    • /
    • pp.293-300
    • /
    • 2017
  • Purpose: This study aims to propose a method for fast geographical origin discrimination between domestic and imported rice using a visible/near-infrared (VNIR) hyperspectral imaging technique. Methods: Hyperspectral reflectance images of South Korean and Chinese rice samples were obtained in the range of 400 nm to 1000 nm. Partial least square discriminant analysis (PLS-DA) models were developed and applied to the acquired images to determine the geographical origin of the rice samples. Results: The optimal pixel dimensions and spectral pretreatment conditions for the hyperspectral images were identified to improve the discrimination accuracy. The results revealed that the highest accuracy was achieved when the hyperspectral image's pixel dimension was $3.0mm{\times}3.0mm$. Furthermore, the geographical origin discrimination models achieved a discrimination accuracy of over 99.99% upon application of a first-order derivative, second-order derivative, maximum normalization, or baseline pretreatment. Conclusions: The results demonstrated that the VNIR hyperspectral imaging technique can be used to discriminate geographical origins of rice.

Development of On-line Sorting System for Detection of Infected Seed Potatoes Using Visible Near-Infrared Transmittance Spectral Technique (가시광 및 근적외선 투과분광법을 이용한 감염 씨감자 온라인 선별시스템 개발)

  • Kim, Dae Yong;Mo, Changyeun;Kang, Jun-Soon;Cho, Byoung-Kwan
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.35 no.1
    • /
    • pp.1-11
    • /
    • 2015
  • In this study, an online seed potato sorting system using a visible and near infrared (40 1100 nm) transmittance spectral technique and statistical model was evaluated for the nondestructive determination of infected and sound seed potatoes. Seed potatoes that had been artificially infected with Pectobacterium atrosepticum, which is known to cause a soil borne disease infection, were prepared for the experiments. After acquiring transmittance spectra from sound and infected seed potatoes, a determination algorithm for detecting infected seed potatoes was developed using the partial least square discriminant analysis method. The coefficient of determination($R^2_p$) of the prediction model was 0.943, and the classification accuracy was above 99% (n = 80) for discriminating diseased seed potatoes from sound ones. This online sorting system has good potential for developing a technique to detect agricultural products that are infected and contaminated by pathogens.

Establishment of discrimination system using multivariate analysis of FT-IR spectroscopy data from different species of artichoke (Cynara cardunculus var. scolymus L.) (FT-IR 스펙트럼 데이터 기반 다변량통계분석기법을 이용한 아티초크의 대사체 수준 품종 분류)

  • Kim, Chun Hwan;Seong, Ki-Cheol;Jung, Young Bin;Lim, Chan Kyu;Moon, Doo Gyung;Song, Seung Yeob
    • Horticultural Science & Technology
    • /
    • v.34 no.2
    • /
    • pp.324-330
    • /
    • 2016
  • To determine whether FT-IR spectral analysis based on multivariate analysis for whole cell extracts can be used to discriminate between artichoke (Cynara cardunculus var. scolymus L.) plants at the metabolic level, leaves of ten artichoke plants were subjected to Fourier transform infrared(FT-IR) spectroscopy. FT-IR spectral data from leaves were analyzed by principal component analysis (PCA), partial least square discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). FT-IR spectra confirmed typical spectral differences between the frequency regions of 1,700-1,500, 1,500-1,300 and $1,100-950cm^{-1}$, respectively. These spectral regions reflect the quantitative and qualitative variations of amide I, II from amino acids and proteins ($1,700-1,500cm^{-1}$), phosphodiester groups from nucleic acid and phospholipid ($1,500-1,300cm^{-1}$) and carbohydrate compounds ($1,100-950cm^{-1}$). PCA revealed separate clusters that corresponded to their species relationship. Thus, PCA could be used to distinguish between artichoke species with different metabolite contents. PLS-DA showed similar species classification of artichoke. Furthermore these metabolic discrimination systems could be used for the rapid selection and classification of useful artichoke cultivars.

Multivariate Procedure for Variable Selection and Classification of High Dimensional Heterogeneous Data

  • Mehmood, Tahir;Rasheed, Zahid
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.6
    • /
    • pp.575-587
    • /
    • 2015
  • The development in data collection techniques results in high dimensional data sets, where discrimination is an important and commonly encountered problem that are crucial to resolve when high dimensional data is heterogeneous (non-common variance covariance structure for classes). An example of this is to classify microbial habitat preferences based on codon/bi-codon usage. Habitat preference is important to study for evolutionary genetic relationships and may help industry produce specific enzymes. Most classification procedures assume homogeneity (common variance covariance structure for all classes), which is not guaranteed in most high dimensional data sets. We have introduced regularized elimination in partial least square coupled with QDA (rePLS-QDA) for the parsimonious variable selection and classification of high dimensional heterogeneous data sets based on recently introduced regularized elimination for variable selection in partial least square (rePLS) and heterogeneous classification procedure quadratic discriminant analysis (QDA). A comparison of proposed and existing methods is conducted over the simulated data set; in addition, the proposed procedure is implemented to classify microbial habitat preferences by their codon/bi-codon usage. Five bacterial habitats (Aquatic, Host Associated, Multiple, Specialized and Terrestrial) are modeled. The classification accuracy of each habitat is satisfactory and ranges from 89.1% to 100% on test data. Interesting codon/bi-codons usage, their mutual interactions influential for respective habitat preference are identified. The proposed method also produced results that concurred with known biological characteristics that will help researchers better understand divergence of species.

Utilization of R Program for the Partial Least Square Model: Comparison of SmartPLS and R (부분최소제곱모형을 위한 R 프로그램의 활용: SmartPLS와 R의 비교)

  • Kim, Yong-Tae;Lee, Sang-Jun
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.117-124
    • /
    • 2015
  • As the acceptance of statistical analysis has been increased because of Big Data, the needs for an advanced second generation of statistical analysis method like Structural Equation Model are also increasing. This study suggests how R-Program, as open software, can be utilized when Partial Least Square Model, one of the SEMs, is applied to statistical analysis. R is a free software as a part of GNU projects as well as a powerful and useful tool for statistical analysis including Big Data. The study utilized R and SmartPLS, a representative statistical package of PLS-SEM, and analyzed internal consistency reliability, convergent validity, and discriminant validity of the measurement model. The study also analyzed path coefficients and moderator effects of the structural model and compared the results, respectively. The results indicated that R showed the same results with SmartPLS on the measurement model and the structural model. Therefore, the study confirmed that R could be a powerful tool that is alternative to a commercial statistical package in the future.

Development of Non-Destructive Sorting Technique for Viability of Watermelon Seed by Using Hyperspectral Image Processing (초분광 영상기술을 이용한 수박종자 발아여부 비파괴 선별기술 개발)

  • Bae, Hyungjin;Seo, Young-Wook;Kim, Dae-Yong;Lohumi, Santosh;Park, Eunsoo;Cho, Byoung-Kwan
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.36 no.1
    • /
    • pp.35-44
    • /
    • 2016
  • Seed viability is one of the most important parameters that is directly related with seed germination performance and seedling emergence. In this study, a hyperspectral imaging (HSI) system having a range of 1000-2500 nm was used to classify viable watermelon seeds from nonviable seeds. In order to obtain nonviable watermelon seeds, a total of 96 seeds were artificially aged by immersing the seeds in hot water ($25^{\circ}C$) for 15 days. Further, hyperspectral images for 192 seeds (96 normal and 96 aged) were acquired using the developed HSI system. A germination test was performed for all the 192 seeds in order to confirm their viability. Spectral data from the hyperspectral images of the seeds were extracted by selecting pixels from the region of interest. Each seed spectrum was averaged and preprocessed to develop a classification model of partial least square discriminant analysis (PLS-DA). The developed PLS-DA model showed a classification accuracy of 94.7% for the calibration set, and 84.2% for the validation set. The results demonstrate that the proposed technique can classify viable and nonviable watermelon seeds with a reasonable accuracy, and can be further converted into an online sorting system for rapid and nondestructive classification of watermelon seeds with regard to viability.

Pattern Recognition for Typification of Whiskies and Brandies in the Volatile Components using Gas Chromatographic Data

  • Myoung, Sungmin;Oh, Chang-Hwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.5
    • /
    • pp.167-175
    • /
    • 2016
  • The volatile component analysis of 82 commercialized liquors(44 samples of single malt whisky, 20 samples of blended whisky and 18 samples of brandy) was carried out by gas chromatography after liquid-liquid extraction with dichloromethane. Pattern recognition techniques such as principle component analysis(PCA), cluster analysis(CA), linear discriminant analysis(LDA) and partial least square discriminant analysis(PLSDA) were applied for the discrimination of different liquor categories. Classification rules were validated by considering sensitivity and specificity of each class. Both techniques, LDA and PLSDA, gave 100% sensitivity and specificity for all of the categories. These results suggested that the common characteristics and identities as typification of whiskies and brandys was founded by using multivariate data analysis method.

Prediction and discrimination of taxonomic relationship within Orostachys species using FT-IR spectroscopy combined by multivariate analysis (FT-IR 스펙트럼 데이터의 다변량 통계분석 기법을 이용한 바위솔속 식물의 분류학적 유연관계 예측 및 판별)

  • Kwon, Yong-Kook;Kim, Suk-Weon;Seo, Jung-Min;Woo, Tae-Ha;Liu, Jang-Ryol
    • Journal of Plant Biotechnology
    • /
    • v.38 no.1
    • /
    • pp.9-14
    • /
    • 2011
  • To determine whether pattern recognition based on metabolite fingerprinting for whole cell extracts can be used to discriminate cultivars metabolically, leaves of nine commercial Orostachys plants were subjected to Fourier transform infrared spectroscopy (FT-IR). FT-IR spectral data from leaves were analyzed by principal component analysis (PCA) and Partial least square discriminant analysis (PLS-DA). The dendrogram based on hierarchical clustering analysis of these PLS-DA data separated the nine Orostachys species into five major groups. The first group consisted of O. iwarenge 'Yimge', 'Jeju', 'Jeongsun' and O. margaritifolius 'Jinju' whereas in the second group, 'Sacheon' was clustered with 'Busan,' both of which belong to O. malacophylla species. However, 'Samchuk', belong to O. malacophylla was not clustered with the other O. malacophylla species. In addition, O. minuta and O. japonica were separated to the other Orostachys plants. Thus we suggested that the hierarchical dendrogram based on PLS-DA of FT-IR spectral data from leaves represented the most probable chemotaxonomical relationship between commercial Orostachys plants. Furthermore these metabolic discrimination systems could be applied for reestablishment of precise taxonomic classification of commercial Orostachys plants.

Geographical Classification of Angelica gigas using UHPLC-DAD Combined Multivariate Analyses (UHPLC-DAD 및 다변량분석법을 이용한 참당귀의 산지감별법 연구)

  • Kim, Jung-Ryul;Lee, Dong Young;Sung, Sang Hyun;Kim, Jinwoong
    • Korean Journal of Pharmacognosy
    • /
    • v.44 no.4
    • /
    • pp.332-335
    • /
    • 2013
  • Geographical classification of A. gigas was performed in the present study using UHPLC-DAD combined with multivariate data analysis techniques. Six active constituents were isolated from A. gigas; nodakenin, marmesin, decursinol, demethylsuberosin, decursin and decursinol angelate. One hundred sixty eight A. gigas samples were simultaneously determined using UHPLC-DAD. A principal component analysis (PCA) and partial least square discriminant analysis (PLS-DA) was used to classify the samples according to geographical origins (Korea and China). The origins of A. gigas from Korea and China were correctly classified by 81.6% and 93.8% using PLS-DA Y prediction. This result demonstrates the potential use of UHPLC-DAD combined with multivariate analysis techniques as an accurate and rapid method to classify A. gigas according to their geographical origin.