• 제목/요약/키워드: cross-validation test

Search Result 177, Processing Time 0.037 seconds

A Study on Forecasting Accuracy Improvement of Case Based Reasoning Approach Using Fuzzy Relation (퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구)

  • Lee, In-Ho;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.67-84
    • /
    • 2010
  • In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.

QTL Analysis to Improve and Diversify the Grain Shape of Rice Cultivars in Korea, Using the Long Grain japonica Cultivar, Langi (초장립종 벼를 이용한 입형 관련 QTL 분석 및 국내 벼 품종 입형 개선 연구)

  • Kim, Suk-Man;Park, Hyun-Su;Lee, Chang-Min;Baek, Man-Kee;Cho, Young-Chan;Suh, Jung-Pil;Jeong, Oh-Young
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.65 no.4
    • /
    • pp.303-313
    • /
    • 2020
  • Rice grain shape is one of the key components of grain yield and market value. An understanding of the genetic basis of the variation in grain shape could be used to improve grain shape. In this study, we developed a total of 265 F2 individuals derived from a cross between japonica cultivars (Josaeng-jado and Langi) and used this population for quantitative trait locus (QLT) analysis. Correlation analysis was performed to identify relationships between grain traits (GL: grain length, GW: grain width, L/W: ratio of length to width, TGW: 1,000 grain weight). The grain shape was positively correlated with GL and TGW, and negatively correlated with GW. In QTL analysis associated with grain shape, one QTL for GL, qGL5, detected on chromosome 5, explained 20.3% of the phenotypic variation (PV), while two QTLs, qGW5 (PV=36.1) and qGW7 (PV=26.1), for GW were identified on chromosomes 5 and 7, respectively. Evaluation of the effects of each of the QTLs on the grain shape in the population showed a significant difference in the grain size in positive lines compared with the lines without the QTLs. According to the QTL combination of the allelic-types, the grain shape of the tested lines varied from semi-round type to long spindle-shaped type. The results of this study extend our knowledge about the genetic pool governing the diversity of grain shape in japonica cultivars and could be used to improve the grain shape of this species through marker-assisted selective breeding in Korea.

Forecasting the Precipitation of the Next Day Using Deep Learning (딥러닝 기법을 이용한 내일강수 예측)

  • Ha, Ji-Hun;Lee, Yong Hee;Kim, Yong-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.2
    • /
    • pp.93-98
    • /
    • 2016
  • For accurate precipitation forecasts the choice of weather factors and prediction method is very important. Recently, machine learning has been widely used for forecasting precipitation, and artificial neural network, one of machine learning techniques, showed good performance. In this paper, we suggest a new method for forecasting precipitation using DBN, one of deep learning techniques. DBN has an advantage that initial weights are set by unsupervised learning, so this compensates for the defects of artificial neural networks. We used past precipitation, temperature, and the parameters of the sun and moon's motion as features for forecasting precipitation. The dataset consists of observation data which had been measured for 40 years from AWS in Seoul. Experiments were based on 8-fold cross validation. As a result of estimation, we got probabilities of test dataset, so threshold was used for the decision of precipitation. CSI and Bias were used for indicating the precision of precipitation. Our experimental results showed that DBN performed better than MLP.

Forecasting of Customer's Purchasing Intention Using Support Vector Machine (Support Vector Machine 기법을 이용한 고객의 구매의도 예측)

  • Kim, Jin-Hwa;Nam, Ki-Chan;Lee, Sang-Jong
    • Information Systems Review
    • /
    • v.10 no.2
    • /
    • pp.137-158
    • /
    • 2008
  • Rapid development of various information technologies creates new opportunities in online and offline markets. In this changing market environment, customers have various demands on new products and services. Therefore, their power and influence on the markets grow stronger each year. Companies have paid great attention to customer relationship management. Especially, personalized product recommendation systems, which recommend products and services based on customer's private information or purchasing behaviors in stores, is an important asset to most companies. CRM is one of the important business processes where reliable information is mined from customer database. Data mining techniques such as artificial intelligence are popular tools used to extract useful information and knowledge from these customer databases. In this research, we propose a recommendation system that predicts customer's purchase intention. Then, customer's purchasing intention of specific product is predicted by using data mining techniques using receipt data set. The performance of this suggested method is compared with that of other data mining technologies.

A Melon Fruit Grading Machine Using a Miniature VIS/NIR Spectrometer: 2. Design Factors for Optimal Interactance Measurement Setup

  • Suh, Sang-Ryong;Lee, Kyeong-Hwan;Yu, Seung-Hwa;Shin, Hwa-Sun;Yoo, Soo-Nam;Choi, Yong-Soo
    • Journal of Biosystems Engineering
    • /
    • v.37 no.3
    • /
    • pp.177-183
    • /
    • 2012
  • Purpose: In near infrared spectroscopy, interactance configuration of a light source and a spectrometer probe can provide more information regarding fruit internal attributes, compared to reflectance and transmittance configuration. However, there is no through study on the parameters of interactance measurement setup. The objective of this study was to investigate the effect of the parameters on the estimation of soluble solids content (SSC) and firmness of muskmelons. Methods: Melon samples were taken from greenhouses at three different harvesting seasons. The prediction models were developed at three distances of 2, 5, and 8 cm between the light source and the spectrometer probe, three measurement points of 2, 3, and 6 evenly distributed on each sample, and different number of fruit samples for calibration models. The performance of the models was compared. Results: In the test at the three distances, the best results were found at a 5 cm distance. The coefficient of determination ($R_{cv}{^2}$) values of the cross-validation were 0.717 (standard error of prediction, SEP=$1.16^{\circ}Brix$) and 0.504 (SEP=4.31 N) for the estimation of SSC and firmness, respectively. The minimum measurement point required to fully represent the spectral characteristics of each fruit sample was 3. The highest $R_{cv}{^2}$ values were 0.736 (SEP=$0.87^{\circ}Brix$) and 0.644 (SEP=4.16 N) for the estimation of SSC and firmness, respectively. The performance of the models began to be saturated when 60 fruit samples were used for developing calibration models. The highest $R_{cv}{^2}$ of 0.713 (SEP=$0.88^{\circ}Brix$) and 0.750 (SEP=3.30 N) for the estimation of SSC and firmness, respectively, were achieved. Conclusions: The performance of the prediction models was quite different according to the condition of interactance measurement setup. In designing a fruit grading machine with interactance configuration, the parameters for interactance measurement setup should be chosen carefully.

An Analysis on Information Seeking Behavior and Needs of Hearing Impaired College Students (청각장애 대학생의 도서관 이용행태와 정보요구에 대한 연구)

  • Jang, Bo Seong
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.1
    • /
    • pp.297-316
    • /
    • 2015
  • This study looks into how hearing-impaired college students use libraries and what their information needs are in order to prepare basic materials which would be applied for developing a library service program and others proper enough to be used by the hearing-impaired college students. In order to achieve the research goal, the study gathered data from a total of 155 hearing-impaired college students through a survey and interviews and a frequency analysis, a cross validation, a t-test and a one-way ANOVA were conducted to analyze the data. At the end of its research, the study confirmed that the hearing-impaired college students' gender, years, degrees of disability, schools, specialties and prosthetic appliances would make significant differences in how the students use the libraries. In addition, the study took a look into differences in the hearing-impaired college students' information needs caused by types of the students' prosthetic appliances, schools and degrees of disability and found out that these types of the prosthetic appliances the students use would significantly affect every category of their information needs. The study now also understands that both the schools and the degrees of disability would make significant differences in a few categories of the information needs, and the former influences education and promotion targeting users and arrangement of sign language interpreters while the latter affects education and promotion targeting users and improvements in browsing environments.

Estimated Soft Information based Most Probable Classification Scheme for Sorting Metal Scraps with Laser-induced Breakdown Spectroscopy (레이저유도 플라즈마 분광법을 이용한 폐금속 분류를 위한 추정 연성정보 기반의 최빈 분류 기술)

  • Kim, Eden;Jang, Hyemin;Shin, Sungho;Jeong, Sungho;Hwang, Euiseok
    • Resources Recycling
    • /
    • v.27 no.1
    • /
    • pp.84-91
    • /
    • 2018
  • In this study, a novel soft information based most probable classification scheme is proposed for sorting recyclable metal alloys with laser induced breakdown spectroscopy (LIBS). Regression analysis with LIBS captured spectrums for estimating concentrations of common elements can be efficient for classifying unknown arbitrary metal alloys, even when that particular alloy is not included for training. Therefore, partial least square regression (PLSR) is employed in the proposed scheme, where spectrums of the certified reference materials (CRMs) are used for training. With the PLSR model, the concentrations of the test spectrum are estimated independently and are compared to those of CRMs for finding out the most probable class. Then, joint soft information can be obtained by assuming multi-variate normal (MVN) distribution, which enables to account the probability measure or a prior information and improves classification performance. For evaluating the proposed schemes, MVN soft information is evaluated based on PLSR of LIBS captured spectrums of 9 metal CRMs, and tested for classifying unknown metal alloys. Furthermore, the likelihood is evaluated with the radar chart to effectively visualize and search the most probable class among the candidates. By the leave-one-out cross validation tests, the proposed scheme is not only showing improved classification accuracies but also helpful for adaptive post-processing to correct the mis-classifications.

Development of Moisture Content Prediction Model for Larix kaempferi Sawdust Using Near Infrared Spectroscopy (근적외선 분광분석법을 이용한 낙엽송 목분의 함수율 예측 모델 개발)

  • Chang, Yoon-Seong;Yang, Sang-Yun;Chung, Hyunwoo;Kang, Kyu-Young;Choi, Joon-Weon;Choi, In-Gyu;Yeo, Hwanmyeong
    • Journal of the Korean Wood Science and Technology
    • /
    • v.43 no.3
    • /
    • pp.304-310
    • /
    • 2015
  • The moisture content of sawdust must be measured accurately and controlled appropriately during storage and transportation because biological degradation could be caused by improper moisture. In this study, to measure the moisture contents of Larix kaempferi sawdust, the near-infrared reflectance spectra (Wavelength 1000-2400 nm) of sawdust were used as detection parameter. After acquiring the NIR reflection spectrum of specimens which were humidified at each relative humidity condition ($25^{\circ}C$, RH 30~99%), moisture content prediction model was developed using mathematical preprocessings (e.g. smoothing, standard normal variate) and partial least squares (PLS) analysis with the acquired spectrum data. High reliability of the MC regression model with NIR spectroscopy was verified by cross validation test ($R^2$ = 0.94, RMSEP = 1.544). The results of this study show that NIR spectroscopy could be used as a convenient and accurate method for the nondestructive determination of moisture content of sawdust, which could lead to optimize wood utilization.

Cross-cultural Validation Test and Application of LSS-short Form (여가만족척도(LSS-short form)의 타당도 검증과 적용)

  • Kim, Mi-Lyang;Lee, Yeon-Ju;Hwang, Sun-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.11
    • /
    • pp.435-445
    • /
    • 2010
  • The purpose of the current study was to examine the validity of the Leisure Satisfaction Scale(LSS) and confirm the relations between the LSS and subjective well-being. The LSS developed by Beard and Ragheb[9] has been used for about 20 years without the examination of the validity since it was utilized in Lee[2]'s study in Korea. First, all items of the original LSS and the operational definition of factors were translated and then inversely translated, and reviewed by experts in leisure studies. A total of 515 respondents participated in the study. To achieve the goals of this study, item-total correlation, correlation analysis, exploratory factor analysis, reliability analysis, multiple regression and confirmatory factor analysis were conducted by SPSS 14.0 and AMOS 5.0 programs. Main findings are as follows: First, the factor names(psychological, educational, social, physical, esthetic, and relaxation) and items were modified. Second, modified operational definition of factors were presented. The reliability and validity of the LSS consisted of 24 items and five factors were very high. Psychological and relaxation factors of LSS affected subjective well-being positively. The new LSS would be useful for the future studies associated with leisure satisfaction.

Calpain-10 SNP43 and SNP19 Polymorphisms and Colorectal Cancer: a Matched Case-control Study

  • Hu, Xiao-Qin;Yuan, Ping;Luan, Rong-Sheng;Li, Xiao-Ling;Liu, Wen-Hui;Feng, Fei;Yan, Jin;Yang, Yan-Fang
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.11
    • /
    • pp.6673-6680
    • /
    • 2013
  • Objective: Insulin resistance (IR) is an established risk factor for colorectal cancer (CRC). Given that CRC and IR physiologically overlap and the calpain-10 gene (CAPN10) is a candidate for IR, we explored the association between CAPN10 and CRC risk. Methods: Blood samples of 400 case-control pairs were genotyped, and the lifestyle and dietary habits of these pairs were recorded and collected. Unconditional logistic regression (LR) was used to assess the effects of CAPN10 SNP43 and SNP19, and environmental factors. Both generalized multifactor dimensionality reduction (GMDR) and the classification and regression tree (CART) were used to test gene-environment interactions for CRC risk. Results: The GA+AA genotype of SNP43 and the Del/Ins+Ins/Ins genotype of SNP19 were marginally related to CRC risk (GA+AA: OR = 1.35, 95% CI = 0.92-1.99; Del/Ins+Ins/Ins: OR = 1.31, 95% CI = 0.84-2.04). Notably, a high-order interaction was consistently identified by GMDR and CART analyses. In GMDR, the four-factor interaction model of SNP43, SNP19, red meat consumption, and smoked meat consumption was the best model, with a maximum cross-validation consistency of 10/10 and testing balance accuracy of 0.61 (P < 0.01). In LR, subjects with high red and smoked meat consumption and two risk genotypes had a 6.17-fold CRC risk (95% CI = 2.44-15.6) relative to that of subjects with low red and smoked meat consumption and null risk genotypes. In CART, individuals with high smoked and red meat consumption, SNP19 Del/Ins+Ins/Ins, and SNP43 GA+AA had higher CRC risk (OR = 4.56, 95%CI = 1.94-10.75) than those with low smoked and red meat consumption. Conclusions: Though the single loci of CAPN10 SNP43 and SNP19 are not enough to significantly increase the CRC susceptibility, the combination of SNP43, SNP19, red meat consumption, and smoked meat consumption is associated with elevated risk.