• Title/Summary/Keyword: Statistical classification

Search Result 1,432, Processing Time 0.045 seconds

A Comparative Study on Chewing Movement in Normal Occlusion and Skeletal Class III Malocclusion (정상교합자와 골격성 III급 부정교합자의 저작운동형태의 비교)

  • SUNG, Kee-Hyuk;SUNG, Jae-Hyun
    • The korean journal of orthodontics
    • /
    • v.27 no.5 s.64
    • /
    • pp.801-813
    • /
    • 1997
  • A comparative study was made on the chewing movements of normal occlusion and skeletal class m malocclusion. Thirty normal occlusion subjects and twenty skeletal class III malocclusion patients were given chewing gums for the study : using BioPAK system, the chewing movement on the frontal plane was recorded and analyzed. With a typical chewing path chosen representing each subject, chewing width, opening distance, opening and closing angles, maximum opening and closing velocities were observed. Seven characteristic patterns were classified based on the types of chewing paths. The followings are the results : 1. Compared with the normal occlusion group, the skeletal class III malocclusion group showed more varied and vertical chewing patterns. 2. In comparision of chewing widths, skeletal class m malocclusion group showed narrower path than the normal occlusion group(p<0.01). 3. In opening distance, skeletal class III malocclusion group appeared shorter than the normal occlusion group without statistical significance(p>0.05). 4. In opening and closing angles, skeletal class III malocclusion group showed more acute angles than the normal occlusion group(p<0.01). 5. In maximum opening and closing velocities, skeletal class III malocclusion group was slower than the normal occlusion group but with no statistical significance(P>0.05). 6. In the classification of chewing movement pattern, the normal occlusion group had Type II as the highest rate at 73.4% ; in skeletal class III malocclusion group, the highest rate was Type III at 35.0%, followed by Type II at 30.0% 7. In the classification of chewing movement pattern, Type IV(chopping type)of skeletal class III malocclusion group showed a higher rate with 25.0% over 3.3% of normal occlusion group.

  • PDF

Characterizing the Spatial Distribution of Oak Wilt Disease Using Remote Sensing Data (원격탐사자료를 이용한 참나무시들음병 피해목의 공간분포특성 분석)

  • Cha, Sungeun;Lee, Woo-Kyun;Kim, Moonil;Lee, Sle-Gee;Jo, Hyun-Woo;Choi, Won-Il
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.3
    • /
    • pp.310-319
    • /
    • 2017
  • This study categorized the damaged trees by Supervised Classification using time-series-aerial photographs of Bukhan, Cheonggae and Suri mountains because oak wilt disease seemed to be concentrated in the metropolitan regions. In order to analyze the spatial characteristics of the damaged areas, the geographical characteristics such as elevation and slope were statistically analyzed to confirm their strong correlation. Based on the results from the statistical analysis of Moran's I, we have retrieved the following: (i) the value of Moran's I in Bukhan mountain is estimated to be 0.25, 0.32, and 0.24 in 2009, 2010 and 2012, respectively. (ii) the value of Moran's I in Cheonggye mountain estimated to be 0.26, 0.32 and 0.22 in 2010, 2012 and 2014, respectively and (iii) the value of Moran's I in Suri mountain estimated to be 0.42 and 0.42 in 2012 and 2014. respectively. These numbers suggest that the damaged trees are distributed in clusters. In addition, we conducted hotspot analysis to identify how the damaged tree clusters shift over time and we were able to verify that hotspots move in time series. According to our research outcome from the analysis of the entire hotspot areas (z-score>1.65), there were 80 percent probability of oak wilt disease occurring in the broadleaf or mixed-stand forests with elevation of 200~400 m and slope of 20~40 degrees. This result indicates that oak wilt disease hotspots can occur or shift into areas with the above geographical features or forest conditions. Therefore, this research outcome can be used as a basic resource when predicting the oak wilt disease spread-patterns, and it can also prevent disease and insect pest related harms to assist the policy makers to better implement the necessary solutions.

Development of a Failure Probability Model based on Operation Data of Thermal Piping Network in District Heating System (지역난방 열배관망 운영데이터 기반의 파손확률 모델 개발)

  • Kim, Hyoung Seok;Kim, Gye Beom;Kim, Lae Hyun
    • Korean Chemical Engineering Research
    • /
    • v.55 no.3
    • /
    • pp.322-331
    • /
    • 2017
  • District heating was first introduced in Korea in 1985. As the service life of the underground thermal piping network has increased for more than 30 years, the maintenance of the underground thermal pipe has become an important issue. A variety of complex technologies are required for periodic inspection and operation management for the maintenance of the aged thermal piping network. Especially, it is required to develop a model that can be used for decision making in order to derive optimal maintenance and replacement point from the economic viewpoint in the field. In this study, the analysis was carried out based on the repair history and accident data at the operation of the thermal pipe network of five districts in the Korea District Heating Corporation. A failure probability model was developed by introducing statistical techniques of qualitative analysis and binomial logistic regression analysis. As a result of qualitative analysis of maintenance history and accident data, the most important cause of pipeline damage was construction erosion, corrosion of pipe and bad material accounted for about 82%. In the statistical model analysis, by setting the separation point of the classification to 0.25, the accuracy of the thermal pipe breakage and non-breakage classification improved to 73.5%. In order to establish the failure probability model, the fitness of the model was verified through the Hosmer and Lemeshow test, the independent test of the independent variables, and the Chi-Square test of the model. According to the results of analysis of the risk of thermal pipe network damage, the highest probability of failure was analyzed as the thermal pipeline constructed by the F construction company in the reducer pipe of less than 250mm, which is more than 10 years on the Seoul area motorway in winter. The results of this study can be used to prioritize maintenance, preventive inspection, and replacement of thermal piping systems. In addition, it will be possible to reduce the frequency of thermal pipeline damage and to use it more aggressively to manage thermal piping network by establishing and coping with accident prevention plan in advance such as inspection and maintenance.

Discrimination Analysis of Production Year of Rice and Brown Rice based on Phospholipids (인지질을 이용한 쌀과 현미의 생산연도 판별 분석)

  • Hong, Jee-Hwa;Ahn, Jongsung;Kim, Yong-Kyoung;Choi, Kyung-Hu;Lee, Min-Hui;Park, Young-Jun;Kim, Hyun-Tae;Lee, Jae-Hwon
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.62 no.2
    • /
    • pp.105-112
    • /
    • 2017
  • The mixing of rice and brown rice produced in different years is banned in Korea by the grain management act. However, there has been no reported method for discriminating the production year of rice. The objective of this study was to develop a method for discriminating the production year of rice and brown rice based on their phospholipids content. One hundred rice samples and 130 brown rice samples produced between 2012 and 2015 were collected. Twelve phosphatidylcholine components were analyzed by liquid chromatography-tandem mass spectrometry. Phosphatidylcholine was used as an internal standard to calculate the peak intensity of the samples. A statistical analysis of the results showed that the centroid distance between the stale and new rice was 4.16 and the classification ratio was 97%. To verify the calculated discriminant, 61 and 40 rice samples were collected. The accuracy of discrimination was 82% by primary verification and 80% by secondary verification. The statistical analysis of brown rice showed that the centroid distance between the stale and new brown rice was 3.14 and the classification ratio was 96%. To verify the calculated discriminant, 10 samples of new rice and 30 samples of stale rice were collected and the accuracy of discrimination was 93%. The accuracy of discrimination for rice stored at room temperature was 57.9-92.1% and that for rice stored at a low temperature was 86.8-94.7%, depending on the storage period. For brown rice, the detection accuracy was 94.7-100% at room temperature and 92.1-100% at a low temperature, depending on the storage period. The accuracy of discrimination for rice was affected by the storage temperature and time, while that for brown rice was more than 92% regardless of the storage conditions. These results suggest that the developed discriminant analysis method could be utilized to determine the production year of rice and brown rice.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Classification of the Core Climatic Region Established by the Entropy of Climate Elements - Focused on the Middle Part Region - (기후요소의 엔트로피에 의한 핵심 기후지역의 구분 - 중부지방을 중심으로 -)

  • Park, Hyun-Wook;Chung, Sung-Suk;Park, Keon-Yeong
    • Journal of the Korean earth science society
    • /
    • v.27 no.2
    • /
    • pp.159-176
    • /
    • 2006
  • Geographic factors and mathmatical location of the Korean Peninsula have great influences on the variation patterns and appearances over a period of ten days of summer precipitation. In order to clarify the influence of several climate factors on precise climate classification in the middle part region of the Korea, weather entropy and the information ratio were calculated on the basis of information theory and of the data of 25 site observations. The data used for this study are the daily precipitation phenomenon over a period of ten days of summer during the recent thirteen years (1991-2003) at the 25 stations in the middle part region of the Korea. It is divided into four classes of no rain, $0.1{\sim}10.0mm/day,\;10.1{\sim}30.0mm/day$, 30.1mm over/day. Their temporal and spatial change were also analyzed. The results are as follows: the maximum and minimum value of calculated weather entropy are 1.870 bits at Chuncheon in the latter ten days of July and 0.960 bits at Ganghwa during mid September, respectively. And weather entropy in each observation sites tends to be larger in the beginning of August and smaller towards the end of September. The largest and smallest values of weather representative ness based on information ratio were observed at Chungju in the beginning of June and at Deagwallyeong towards the end of July. However, the largest values of weather representativeness came out during the middle or later part of September when 15 sites were adopted as the center of weather forecasting. The representative core region of weather forecasting and climate classification in the middle part region of the Korea are inside of the triangle region of the Buyeo, Incheon, and Gangneung.

A Study of variables Related to Nursing Productivity (간호생산성에 관한 연구: 관련변수의 검증을 중심으로)

  • 박광옥
    • Journal of Korean Academy of Nursing
    • /
    • v.24 no.4
    • /
    • pp.584-596
    • /
    • 1994
  • The objective of the study is to explore the relationships between the variables of nursing productivity on the framework of system del in the tertiary university based care hospital in Korea. Productivity is basically defined as the relation-ship between inputs and outputs. Under the proposition that the nursing unit is a system that produces nursing care output using personal and material resources through the nursing intervention and nursing care management. And this major conception of nursing productivity system comproises input, process and output and feed-back. These categorized variables are essential parts to produce desirable and meaningful out-put. While nursing personnel from head nurse to staff nurses cooperate with each other, the head nurse directs her subordinates to achieve the goal of nursing care unit. In this procedure, the head nurse uses the leadership of authority and benevolence. Meantime nursing productivity will be greatly influenced by environment and surrounding organizational structures, and by also the operational objectives, the policy and standards of procedures. For the study of nursing productivity one sample hospital with 15 general nursing care units was selected. Research data were collected for 3 weeks from May 31 to June 20 in 1993. Input variables were measured in terms of both the served and the server. And patient classification scores were measured drily by degree of nursing care needs that indicated patent case-mix. And also nurses' educational period for profession and clinical experience and the score of nurses' personality were measured as producer input variables by the questionnaires. The process varialbes act necessarily on leading input resources and result in desirable nursing outputs. Thus the head nurse's leadership perceived by her followers is defined as process variable. The output variables were defined as length of stay, average nursing care hours per patient a day the score of quality of nursing care, the score of patient satisfaction, the score of nurse's job satis-faction. The nursing unit was the basis of analysis, and various statistical analyses were used : Reliability analysis(Cronbach's alpha) for 5 measurement tools and Pearson-correlation analysis, multiple regression analysis, and canonical correlation analysis for the test of the relationship among the variables. The results were as follows : 1. Significant positive relationship between the score of patient classification and length of stay was found(r=.6095, p.008). 2. Regression coefficient between the score of patient classification and length of stay was significant (β=.6245, p=.0128), and variance explained was 39%. 3. Significant positive relationship between nurses’ educational period and length of stay was found(r=-.4546, p=.044). 5. Regression coefficient between nurses' educational period and the score of quality of nursing care was significant (β=.5600, p=.029), and variance explained was 31.4%. 6. Significant positive relationship between the score of head nurse's leadership of authoritic characteristics and the length of stay was found (r=.5869, p=.011). 7. Significant negative relationship between the score of head nurse's leadership of benevolent characteristics and average nursing care hours was found(r=-.4578, p=.043). 8. Regression coefficient between the score of head nurse's leadership of benevolent characteristics and average nursing care hours was significant(β=-.6912, p=.0043), variance explained was 47.8%. 9. Significant positive relationship between the score of the head nurse's leadership of benevolent characteristics and the score of nurses' job satis-faction was found(r=.4499, p=050). 10. A significant canonical correlation was found between the group of the independent variables consisted of the score of the nurses' personality, the score of the head nurse's leadership of authoritic characteristics and the group of the dependent variables consisted of the length of stay, average nursing care hours(Rc²=.4771, p=.041). Through these results, the assumed relationships between input variables, process variable, output variables were partly supported. In addition it is also considered necessary that-further study on the relationships between nurses' personality and nurses' educational period, between nurses' clinical experience including skill level and output variables in many research samples should be made.

  • PDF

Corporate Bond Rating Using Various Multiclass Support Vector Machines (다양한 다분류 SVM을 적용한 기업채권평가)

  • Ahn, Hyun-Chul;Kim, Kyoung-Jae
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.157-178
    • /
    • 2009
  • Corporate credit rating is a very important factor in the market for corporate debt. Information concerning corporate operations is often disseminated to market participants through the changes in credit ratings that are published by professional rating agencies, such as Standard and Poor's (S&P) and Moody's Investor Service. Since these agencies generally require a large fee for the service, and the periodically provided ratings sometimes do not reflect the default risk of the company at the time, it may be advantageous for bond-market participants to be able to classify credit ratings before the agencies actually publish them. As a result, it is very important for companies (especially, financial companies) to develop a proper model of credit rating. From a technical perspective, the credit rating constitutes a typical, multiclass, classification problem because rating agencies generally have ten or more categories of ratings. For example, S&P's ratings range from AAA for the highest-quality bonds to D for the lowest-quality bonds. The professional rating agencies emphasize the importance of analysts' subjective judgments in the determination of credit ratings. However, in practice, a mathematical model that uses the financial variables of companies plays an important role in determining credit ratings, since it is convenient to apply and cost efficient. These financial variables include the ratios that represent a company's leverage status, liquidity status, and profitability status. Several statistical and artificial intelligence (AI) techniques have been applied as tools for predicting credit ratings. Among them, artificial neural networks are most prevalent in the area of finance because of their broad applicability to many business problems and their preeminent ability to adapt. However, artificial neural networks also have many defects, including the difficulty in determining the values of the control parameters and the number of processing elements in the layer as well as the risk of over-fitting. Of late, because of their robustness and high accuracy, support vector machines (SVMs) have become popular as a solution for problems with generating accurate prediction. An SVM's solution may be globally optimal because SVMs seek to minimize structural risk. On the other hand, artificial neural network models may tend to find locally optimal solutions because they seek to minimize empirical risk. In addition, no parameters need to be tuned in SVMs, barring the upper bound for non-separable cases in linear SVMs. Since SVMs were originally devised for binary classification, however they are not intrinsically geared for multiclass classifications as in credit ratings. Thus, researchers have tried to extend the original SVM to multiclass classification. Hitherto, a variety of techniques to extend standard SVMs to multiclass SVMs (MSVMs) has been proposed in the literature Only a few types of MSVM are, however, tested using prior studies that apply MSVMs to credit ratings studies. In this study, we examined six different techniques of MSVMs: (1) One-Against-One, (2) One-Against-AIL (3) DAGSVM, (4) ECOC, (5) Method of Weston and Watkins, and (6) Method of Crammer and Singer. In addition, we examined the prediction accuracy of some modified version of conventional MSVM techniques. To find the most appropriate technique of MSVMs for corporate bond rating, we applied all the techniques of MSVMs to a real-world case of credit rating in Korea. The best application is in corporate bond rating, which is the most frequently studied area of credit rating for specific debt issues or other financial obligations. For our study the research data were collected from National Information and Credit Evaluation, Inc., a major bond-rating company in Korea. The data set is comprised of the bond-ratings for the year 2002 and various financial variables for 1,295 companies from the manufacturing industry in Korea. We compared the results of these techniques with one another, and with those of traditional methods for credit ratings, such as multiple discriminant analysis (MDA), multinomial logistic regression (MLOGIT), and artificial neural networks (ANNs). As a result, we found that DAGSVM with an ordered list was the best approach for the prediction of bond rating. In addition, we found that the modified version of ECOC approach can yield higher prediction accuracy for the cases showing clear patterns.

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.