• Title/Summary/Keyword: Standard error of prediction

Search Result 324, Processing Time 0.034 seconds

Corporate Bond Rating Using Various Multiclass Support Vector Machines (다양한 다분류 SVM을 적용한 기업채권평가)

  • Ahn, Hyun-Chul;Kim, Kyoung-Jae
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.157-178
    • /
    • 2009
  • Corporate credit rating is a very important factor in the market for corporate debt. Information concerning corporate operations is often disseminated to market participants through the changes in credit ratings that are published by professional rating agencies, such as Standard and Poor's (S&P) and Moody's Investor Service. Since these agencies generally require a large fee for the service, and the periodically provided ratings sometimes do not reflect the default risk of the company at the time, it may be advantageous for bond-market participants to be able to classify credit ratings before the agencies actually publish them. As a result, it is very important for companies (especially, financial companies) to develop a proper model of credit rating. From a technical perspective, the credit rating constitutes a typical, multiclass, classification problem because rating agencies generally have ten or more categories of ratings. For example, S&P's ratings range from AAA for the highest-quality bonds to D for the lowest-quality bonds. The professional rating agencies emphasize the importance of analysts' subjective judgments in the determination of credit ratings. However, in practice, a mathematical model that uses the financial variables of companies plays an important role in determining credit ratings, since it is convenient to apply and cost efficient. These financial variables include the ratios that represent a company's leverage status, liquidity status, and profitability status. Several statistical and artificial intelligence (AI) techniques have been applied as tools for predicting credit ratings. Among them, artificial neural networks are most prevalent in the area of finance because of their broad applicability to many business problems and their preeminent ability to adapt. However, artificial neural networks also have many defects, including the difficulty in determining the values of the control parameters and the number of processing elements in the layer as well as the risk of over-fitting. Of late, because of their robustness and high accuracy, support vector machines (SVMs) have become popular as a solution for problems with generating accurate prediction. An SVM's solution may be globally optimal because SVMs seek to minimize structural risk. On the other hand, artificial neural network models may tend to find locally optimal solutions because they seek to minimize empirical risk. In addition, no parameters need to be tuned in SVMs, barring the upper bound for non-separable cases in linear SVMs. Since SVMs were originally devised for binary classification, however they are not intrinsically geared for multiclass classifications as in credit ratings. Thus, researchers have tried to extend the original SVM to multiclass classification. Hitherto, a variety of techniques to extend standard SVMs to multiclass SVMs (MSVMs) has been proposed in the literature Only a few types of MSVM are, however, tested using prior studies that apply MSVMs to credit ratings studies. In this study, we examined six different techniques of MSVMs: (1) One-Against-One, (2) One-Against-AIL (3) DAGSVM, (4) ECOC, (5) Method of Weston and Watkins, and (6) Method of Crammer and Singer. In addition, we examined the prediction accuracy of some modified version of conventional MSVM techniques. To find the most appropriate technique of MSVMs for corporate bond rating, we applied all the techniques of MSVMs to a real-world case of credit rating in Korea. The best application is in corporate bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. For our study the research data were collected from National Information and Credit Evaluation, Inc., a major bond-rating company in Korea. The data set is comprised of the bond-ratings for the year 2002 and various financial variables for 1,295 companies from the manufacturing industry in Korea. We compared the results of these techniques with one another, and with those of traditional methods for credit ratings, such as multiple discriminant analysis (MDA), multinomial logistic regression (MLOGIT), and artificial neural networks (ANNs). As a result, we found that DAGSVM with an ordered list was the best approach for the prediction of bond rating. In addition, we found that the modified version of ECOC approach can yield higher prediction accuracy for the cases showing clear patterns.

Quantitative Analysis of Carbohydrate, Protein, and Oil Contents of Korean Foods Using Near-Infrared Reflectance Spectroscopy (근적외 분광분석법을 이용한 국내 유통 식품 함유 탄수화물, 단백질 및 지방의 정량 분석)

  • Song, Lee-Seul;Kim, Young-Hak;Kim, Gi-Ppeum;Ahn, Kyung-Geun;Hwang, Young-Sun;Kang, In-Kyu;Yoon, Sung-Won;Lee, Junsoo;Shin, Ki-Yong;Lee, Woo-Young;Cho, Young Sook;Choung, Myoung-Gun
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.43 no.3
    • /
    • pp.425-430
    • /
    • 2014
  • Foods contain various nutrients such as carbohydrates, protein, oil, vitamins, and minerals. Among them, carbohydrates, protein, and oil are the main constituents of foods. Usually, these constituents are analyzed by the Kjeldahl and Soxhlet method and so on. However, these analytical methods are complex, costly, and time-consuming. Thus, this study aimed to rapidly and effectively analyze carbohydrate, protein, and oil contents with near-infrared reflectance spectroscopy (NIRS). A total of 517 food samples were measured within the wavelength range of 400 to 2,500 nm. Exactly 412 food calibration samples and 162 validation samples were used for NIRS equation development and validation, respectively. In the NIRS equation of carbohydrates, the most accurate equation was obtained under 1, 4, 5, 1 (1st derivative, 4 nm gap, 5 points smoothing, and 1 point second smoothing) math treatment conditions using the weighted MSC (multiplicative scatter correction) scatter correction method with MPLS (modified partial least square) regression. In the case of protein and oil, the best equation were obtained under 2, 5, 5, 3 and 1, 1, 1, 1 conditions, respectively, using standard MSC and standard normal variate only scatter correction methods with MPLS regression. Calibrations of these NIRS equations showed a very high coefficient of determination in calibration ($R^2$: carbohydrates, 0.971; protein, 0.974; oil, 0.937) and low standard error of calibration (carbohydrates, 4.066; protein, 1.080; oil, 1.890). Optimal equation conditions were applied to a validation set of 162 samples. Validation results of these NIRS equations showed a very high coefficient of determination in prediction ($r^2$: carbohydrates, 0.987; protein, 0.970; oil, 0.947) and low standard error of prediction (carbohydrates, 2.515; protein, 1.144; oil, 1.370). Therefore, these NIRS equations can be applicable for determination of carbohydrates, proteins, and oil contents in various foods.

Evaluation of benzene residue in edible oils using Fourier transform infrared (FTIR) spectroscopy

  • Joshi, Ritu;Cho, Byoung-Kwan;Lohumi, Santosh;Joshi, Rahul;Lee, Jayoung;Lee, Hoonsoo;Mo, Changyeun
    • Korean Journal of Agricultural Science
    • /
    • v.46 no.2
    • /
    • pp.257-271
    • /
    • 2019
  • The use of food grade hexane (FGH) for edible oil extraction is responsible for the presence of benzene in the crude oil. Benzene is a Group 1 carcinogen and could pose a serious threat to the health of consumer. However, its detection still depends on classical methods using chromatography which requires a rapid non-destructive detection method. Hence, the aim of this study was to investigate the feasibility of using Fourier transform infrared (FTIR) spectroscopy combined with multivariate analysis to detect and quantify the benzene residue in edible oil (sesame and cottonseed oil). Oil samples were adulterated with varying quantities of benzene, and their FTIR spectra were acquired with an attenuated total reflectance (ATR) method. Optimal variables for a partial least-squares regression (PLSR) model were selected using the variable importance in projection (VIP) and the selectivity ratio (SR) methods. The developed PLS models with whole variables and the VIP- and SR-selected variables were validated against an independent data set which resulted in $R^2$ values of 0.95, 0.96, and 0.95 and standard error of prediction (SEP) values of 38.5, 33.7, and 41.7 mg/L, respectively. The proposed technique of FTIR combined with multivariate analysis and variable selection methods can detect benzene residuals in edible oils with the advantages of being fast and simple and thus, can replace the conventional methods used for the same purpose.

Measurement of lipid content of compost fermentation using near-infrared spectroscopy

  • Daisuke Masui;Suehara, Ken-ichiro;Yasuhisa Nakano;Takuo Yano
    • Near Infrared Analysis
    • /
    • v.2 no.1
    • /
    • pp.37-42
    • /
    • 2001
  • Near infrared spectroscopy (NIRS) was applied to determination of the lipid content of the compost during the compost fermentation of tofu (soybean0curd) refuse. The absorption of lipid observed at 5 wavelengths, 1208, 1712, 1772, 2312 and 2352 nm on the second derivative spectra. To formulated a calibration equation, a multiple linear regression analysis was carried out between the near-infrared spectral data and on the lipid content in the calibration sample set (sample number, n=60) obtained using Soxhlet extraction method. The value of the multiple correlation coefficient (R) was 0.975 when using the wavelengths of 1208 and 1712 nm were used in the calibration equation. To validate the calibration equation obtained, the lipid content in the validation sample set (n=35) not used for formulating the calibration equation was calculated using the calibration equation, and compared with the value obtained using the Soxhlet extraction method. Good agreement was observed between the results of the Soxhlet extraction method and those values of the NIRS method. The simple correlation coefficient (r) and standard error of prediction (SEP) were 0.964 and 0.815 %, respectively. suitability of the lipid content as an indicator of the compost fermentation of tofu refuse was also studied. The decrease of the lipid content in the compost corresponded to the decrease of the total dry weight of the compost in the composter. The lipid content was a significant indicator of the compost fermentation. The NIRS method was applied to measure the time course of the lipid content in the compost fermentation and good results were obtained. The study indicates that NIRS is a useful method for process management of the compost fermentation of tofu refuse.

Machine Vision Technique for Rapid Measurement of Soybean Seed Vigor

  • Lee, Hoonsoo;Huy, Tran Quoc;Park, Eunsoo;Bae, Hyung-Jin;Baek, Insuck;Kim, Moon S.;Mo, Changyeun;Cho, Byoung-Kwan
    • Journal of Biosystems Engineering
    • /
    • v.42 no.3
    • /
    • pp.227-233
    • /
    • 2017
  • Purpose: Morphological properties of soybean roots are important indicators of the vigor of the seed, which determines the survival rate of the seedlings grown. The current vigor test for soybean seeds is manual measurement with the human eye. This study describes an application of a machine vision technique for rapid measurement of soybean seed vigor to replace the time-consuming and labor-intensive conventional method. Methods: A CCD camera was used to obtain color images of seeds during germination. Image processing techniques were used to obtain root segmentation. The various morphological parameters, such as primary root length, total root length, total surface area, average diameter, and branching points of roots were calculated from a root skeleton image using a customized pixel-based image processing algorithm. Results: The measurement accuracy of the machine vision system ranged from 92.6% to 98.8%, with accuracies of 96.2% for primary root length and 96.4% for total root length, compared to manual measurement. The correlation coefficient for each measurement was 0.999 with a standard error of prediction of 1.16 mm for primary root length and 0.97 mm for total root length. Conclusions: The developed machine vision system showed good performance for the morphological measurement of soybean roots. This image analysis algorithm, combined with a simple color camera, can be used as an alternative to the conventional seed vigor test method.

Hyperspectral imaging technique to evaluate the firmness and the sweetness index of tomatoes

  • Rahman, Anisur;Park, Eunsoo;Bae, Hyungjin;Cho, Byoung-Kwan
    • Korean Journal of Agricultural Science
    • /
    • v.45 no.4
    • /
    • pp.823-837
    • /
    • 2018
  • The objective of this study was to evaluate the firmness and the sweetness index (SI) of tomatoes with a hyperspectral imaging (HSI) technique within the wavelength range of 1000 - 1550 nm. The hyperspectral images of 95 tomatoes were acquired with a push-broom hyperspectral reflectance imaging system, from which the mean spectra of each tomato were extracted from the regions of interest. The reference firmness and sweetness index of the same sample was measured and calibrated with their corresponding spectral data by partial least squares (PLS) regression with different preprocessing methods. The calibration model developed by PLS regression based on the Savitzky-Golay second-derivative preprocessed spectra resulted in a better performance for both the firmness and the SI of the tomatoes compared to models developed by other preprocessing methods. The correlation coefficients ($R_{pred}$) were 0.82, and 0.74 with a standard error of prediction of 0.86 N, and 0.63, respectively. Then, the feature wavelengths were identified using a model-based variable selection method, i.e., variable importance in projection, from the PLS regression analyses. Finally, chemical images were derived by applying the respective regression coefficients on the spectral image in a pixel-wise manner. The resulting chemical images provided detailed information on the firmness and the SI of the tomatoes. The results show that the proposed HSI technique has potential for rapid and non-destructive evaluation of firmness and the sweetness index of tomatoes.

Partial Least Squares Analysis on Near-Infrared Absorbance Spectra by Air-dried Specific Gravity of Major Domestic Softwood Species

  • Yang, Sang-Yun;Park, Yonggun;Chung, Hyunwoo;Kim, Hyunbin;Park, Se-Yeong;Choi, In-Gyu;Kwon, Ohkyung;Cho, Kyu-Chae;Yeo, Hwanmyeong
    • Journal of the Korean Wood Science and Technology
    • /
    • v.45 no.4
    • /
    • pp.399-408
    • /
    • 2017
  • Research on the rapid and accurate prediction of physical properties of wood using near-infrared (NIR) spectroscopy has attracted recent attention. In this study, partial least squares analysis was performed between NIR spectra and air-dried specific gravity of five domestic conifer species including larch (Larix kaempferi), Korean pine (Pinus koraiensis), red pine (Pinus densiflora), cedar (Cryptomeria japonica), and cypress (Chamaecyparis obtusa). Fifty different lumbers per species were purchased from the five National Forestry Cooperative Federations of Korea. The air-dried specific gravity of 100 knot- and defect-free specimens of each species was determined by NIR spectroscopy in the range of 680-2500 nm. Spectral data preprocessing including standard normal variate, detrend and forward first derivative (gap size = 8, smoothing = 8) were applied to all the NIR spectra of the specimens. Partial least squares analysis including cross-validation (five groups) was performed with the air-dried specific gravity and NIR spectra. When the performance of the regression model was expressed as $R^2$ (coefficient of determination) and root mean square error of calibration (RMSEC), $R^2$ and RMSEC were 0.63 and 0.027 for larch, 0.68 and 0.033 for Korean pine, 0.62 and 0.033 for red pine, 0.76 and 0.022 for cedar, and 0.79 and 0.027 for cypress, respectively. For the calibration model, which contained all species in this study, the $R^2$ was 0.75 and the RMSEC was 0.37.

Application of Near Infrared Spectroscopy for Nondestructive Evaluation of Nitrogen Content in Ginseng

  • Lin, Gou-lin;Sohn, Mi-Ryeong;Kim, Eun-Ok;Kwon, Young-Kil;Cho, Rae-Kwang
    • Proceedings of the Korean Society of Near Infrared Spectroscopy Conference
    • /
    • 2001.06a
    • /
    • pp.1528-1528
    • /
    • 2001
  • Ginseng cultivated in different country or growing condition has generally different components such as saponin and protein, and it relates to efficacy and action. Protein content assumes by nitrogen content in ginseng radix. Nitrogen content could be determined by chemical analysis such as kjeldahl or extraction methods. However, these methods require long analysis time and result environmental pollution and sample damage. In this work we investigated possibility of non-destructive determination of nitrogen content in ginseng radix using near-infrared spectroscopy. Ginseng radix, root of Panax ginseng C. A. Meyer, was studied. Total 120 samples were used in this study and it was consisted of 6 sample sets, 4, 5 and 6-year-old Korea ginseng and 7, 8 and 9-year-old China ginseng, respectively. Each sample set has 20 sample. Nigrogen content was measured by electronic analysis. NIR reflectance spectra were collected over the 1100 to 2500 nm spectral region with a InfraAlyzer 500C (Bran+Luebbe, Germany) equipped with a halogen lapmp and PbS detector and data were collected every 2 nm data point intervals. The calibration models were carried out by multiple linear regression (MLR) and partial least squares (PLS) analysis using IDAS and SESAME software. Result of electronic analysis, Korean ginseng were different mean value in nitrogen content of China ginseng. Ginseng tend to generally decrease the nitrogen content according as cultivation year is over 6 years. The MLR calibration model with 8 wavelengths using IDAS software accurately predicted nitrogen contents with correlation coefficient (R) and standard error of prediction of 0.985 and 0.855%, respectively. In case of SESAME software, the MLR calibration with 9 wavelength was selected the best calibration, R and SEP were 0.972 and 0.596%, respectively. The PLSR calibration model result in 0.969 of R and 0.630 of RMSEP. This study shows the NIR spectroscopy could be applied to determine the nitrogen content in ginseng radix with high accuracy.

  • PDF

Evaluation of Firmness and Sweetness Index of Tomatoes using Hyperspectral Imaging

  • Rahman, Anisur;Faqeerzada, Mohammad Akbar;Joshi, Rahul;Cho, Byoung-Kwan
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2017.04a
    • /
    • pp.44-44
    • /
    • 2017
  • The objective of this study was to evaluate firmness, and sweetness index (SI) of tomatoes (Lycopersicum esculentum) by using hyperspectral imaging (HSI) in the range of 1000-1400 nm. The mean spectra of the 95 matured tomato samples were extracted from the hyperspectral images, and the reference firmness and sweetness index of the same sample were measured and calibrated with their corresponding spectral data by partial least squares (PLS) regression with different preprocessing method. The results showed that the regression model developed by PLS regression based on Savitzky-Golay (S-G) second-derivative preprocessed spectra resulted in better performance for firmness, and SI of tomatoes compared to models developed by other preprocessing methods, with correlation coefficients (rpred) of 0.82, and 0.74 with standard error of prediction (SEP) of 0.86 N, and 0.63 respectively. Then, the feature wavelengths were identified using model-based variable selection method, i.e., variable important in projection (VIP), resulting from the PLS regression analyses and finally chemical images were derived by applying the respective regression coefficient on the spectral image in a pixel-wise manner. The resulting chemical images provided detailed information on firmness, and sweetness index (SI) of tomatoes. Therefore, these research demonstrated that HIS technique has a potential for rapid and non-destructive evaluation of the firmness and sweetness index of tomatoes.

  • PDF

The Prediction of Blending Ratio of Cut Tobacco, Expanded Stem, and Expanded Cut Tobacco in Cigarettes using Near Infrared Spectroscopy (근적외분광법을 이용한 권련 중 일반각초, 팽화주맥 및 팽화각초 배합비 분석)

  • 김용옥;정한주;김기환
    • Journal of the Korean Society of Tobacco Science
    • /
    • v.22 no.1
    • /
    • pp.76-83
    • /
    • 2000
  • This study was carried out to predict blending ratio of cut tobacco(CT), expanded stem(ES), and expanded cut tobacco(ECT) in cigarettes. CT, ES, and ECT samples from A brand were, ground and blended with reference to A blending ratio, and scanned by near infrared spectroscopy(NIRSystem Co., Model 6500). Calibration equations were developed and then determined blending ratio by NIRS. The standard error of calibration(SEC) and performance(SEP) of C factory samples between NIRS and known blending ratio were 0.97%, 1.93% for CT, 0.50%, 1.12 % for ES and 0.68%, 1.10% for ECT, respectively. The SEP of CT, ES and ECT of Band D factory samples determined by C factory calibration equation were more inaccurate than those of C factory samples determined by C factory calibration equations. These results were caused by the difference of CT, ES and ECT spectra followed by each factory. The SEP of CT, ES and ECT of Band D factories determined by calibration equations derived from each factory samples were more accurate than those of determined by calibration equation derived from C factory samples. Each factory SEP of CT, ES and ECT determined by calibration equation derived from all calibration samples(B+C+D factory) was similar to that determined by calibration equation derived from each factory samples. To improve the analytical inaccuracy caused by spectra difference, we need to apply a specific calibration equation for each factory sample. Data in development of specific calibrations between sample and NIRS spectra might supply a method for rapid determination of blending ratio of CT, ES, and ECT.

  • PDF