• Title/Summary/Keyword: second-order accuracy

Search Result 563, Processing Time 0.039 seconds

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

Accurate Quality Control Method of Bone Mineral Density Measurement -Focus on Dual Energy X-ray Absorptiometry- (골밀도 측정의 정확한 정도관리방법 -이중 에너지 방사선 흡수법을 중심으로-)

  • Kim, Ho-Sung;Dong, Kyung-Rae;Ryu, Young-Hwan
    • Journal of radiological science and technology
    • /
    • v.32 no.4
    • /
    • pp.361-370
    • /
    • 2009
  • The image quality management of bone mineral density is the responsibility and duty of radiologists who carry out examinations. However, inaccurate conclusions due to lack of understanding and ignorance regarding the methodology of image quality management can be a fatal error to the patient. Therefore, objective of this paper is to understand proper image quality management and enumerate methods for examiners and patients, thereby ensuring the reliability of bone mineral density exams. The accuracy and precision of bone mineral density measurements must be at the highest level so that actual biological changes can be detected with even slight changes in bone mineral density. Accuracy and precision should be continuously preserved for image quality of machines. Those factors will contribute to ensure the reliability in bone mineral density exams. Proper equipment management or control methods are set with correcting equipment each morning and after image quality management, a phantom, recommended from the manufacturer, is used for ten to twenty-five measurements in search of a mean value with a permissible range of ${\pm}1.5%$ set as standard. There needs to be daily measurement inspections on the phantom or at least inspections three times a week in order to confirm the existence or nonexistence of changes in values in actual bone mineral density. in addition, bone mineral density measurements were evaluated and recorded following the rules of Shewhart control chart. This type of management has to be conducted for the installation and movement of equipment. For the management methods of inspectors, evaluation of the measurement precision was conducted by testing the reproducibility of the exact same figures without any real biological changes occurring during reinspection. Bone mineral density inspection was applied as the measurement method for patients either taking two measurements thirty times or three measurements fifteen times. An important point when taking measurements was after a measurement whether it was the second or third examination, it was required to descend from the table and then reascend. With a 95% confidence level, the precision error produced from the measurement bone mineral figures came to 2.77 times the minimum of the biological bone mineral density change. The value produced can be stated as the least significant change (LSC) and in the case the value is greater, it can be stated as a section of genuine biological change. From the initial inspection to equipment moving and shifter, management must be carried out and continued in order to achieve the effects. The enforcement of proper quality control of radiologists performing bone mineral density inspections which brings about the durability extensions of equipment and accurate results of calculations will help the assurance of reliable inspections.

  • PDF

Impact of Lambertian Cloud Top Pressure Error on Ozone Profile Retrieval Using OMI (램버시안 구름 모델의 운정기압 오차가 OMI 오존 프로파일 산출에 미치는 영향)

  • Nam, Hyeonshik;Kim, Jae Hawn;Shin, Daegeun;Baek, Kanghyun
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.3
    • /
    • pp.347-358
    • /
    • 2019
  • Lambertian cloud model (Lambertian Cloud Model) is the simplified cloud model which is used to effectively retrieve the vertical ozone distribution of the atmosphere where the clouds exist. By using the Lambertian cloud model, the optical characteristics of clouds required for radiative transfer simulation are parametrized by Optical Centroid Cloud Pressure (OCCP) and Effective Cloud Fraction (ECF), and the accuracy of each parameter greatly affects the radiation simulation accuracy. However, it is very difficult to generalize the vertical ozone error due to the OCCP error because it varies depending on the radiation environment and algorithm setting. In addition, it is also difficult to analyze the effect of OCCP error because it is mixed with other errors that occur in the vertical ozone calculation process. This study analyzed the ozone retrieval error due to OCCP error using two methods. First, we simulated the impact of OCCP error on ozone retrieval based on Optimal Estimation. Using LIDORT radiation model, the radiation error due to the OCCP error is calculated. In order to convert the radiation error to the ozone calculation error, the radiation error is assigned to the conversion equation of the optimal estimation method. The results show that when the OCCP error occurs by 100 hPa, the total ozone is overestimated by 2.7%. Second, a case analysis is carried out to find the ozone retrieval error due to OCCP error. For the case analysis, the ozone retrieval error is simulated assuming OCCP error and compared with the ozone error in the case of PROFOZ 2005-2006, an OMI ozone profile product. In order to define the ozone error in the case, we assumed an ideal assumption. Considering albedo, and the horizontal change of ozone for satisfying the assumption, the 49 cases are selected. As a result, 27 out of 49 cases(about 55%)showed a correlation of 0.5 or more. This result show that the error of OCCP has a significant influence on the accuracy of ozone profile calculation.

Treatment of Extremely High Risk and Resistant Gestational Trophoblastic Neoplasia Patients in King Chulalongkorn Memorial Hospital

  • Oranratanaphan, Shina;Lertkhachonsuk, Ruangsak
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.2
    • /
    • pp.925-928
    • /
    • 2014
  • Background: Gestational trophoblastic neoplasia (GTN) is a spectrum of disease with abnormal trophoblastic proliferation. Treatment is based on FIGO stage and WHO risk factor scores. Patients whose score is 12 or more are considered as at extremely high risk with a high likelihood of resistance to first line treatment. Optimal therapy is therefore controversial. Objective: This study was conducted in order to summarize the regimen used for extremely high risk or resistant GTN patients in our institution the in past 10 years. Materials and Methods: All the charts of GTN patients classified as extremely high risk, recurrent or resistant during 1 January 2002 to 31 December 2011 were reviewed. Criteria for diagnosis of GTN were also assessed to confirm the diagnosis. FIGO stage and WHO risk prognostic score were also re-calculated to ensure the accuracy of the information. Patient characteristics were reviewed in the aspects of age, weight, height, BMI, presenting symptoms, metastatic area, lesions, FIGO stage, WHO risk factor score, serum hCG level, treatment regimen, adjuvant treatments, side effects and response to treatment, including disease free survival. Results: Eight patients meeting the criteria of extremely high risk or resistant GTN were included in this review. Mean age was 33.6 years (SD=13.5, range 17-53). Of the total, 3 were stage III (37.5%) and 5 were stage IV (62.5%). Mean duration from previous pregnancies to GTN was 17.6 months (SD 9.9). Mean serum hCG level was 864,589 mIU/ml (SD 98,151). Presenting symptoms of the patients were various such as hemoptysis, abdominal pain, headache, heavy vaginal bleeding and stroke. The most commonly used first line chemotherapeutic regimen in our institution was the VAC regimen which was given to 4 of 8 patients in this study. The most common second line chemotherapy was EMACO. Adjuvant radiation was given to most of the patients who had brain metastasis. Most of the patients have to delay chemotherapy for 1-2 weeks due to grade 2-3 leukopenia and require G-CSF to rescue from neutropenia. Five form 8 patients were still survived. Mean of disease free survival was 20.4 months. Two patients died of the disease, while another one patient died from sepsis of pressure sore wound. None of surviving patients developed recurrence of disease after complete treatment. Conclusions: In extremely high risk GTN patients, main treatment is multi-agent chemotherapy. In our institution, we usually use VAC as a first line treatment of high risk GTN, but since resistance is quite common, this may not suitable for extremely high risk GTN patients. The most commonly used second line multi-agent chemotherapy in our institution is EMA-CO. Adjuvant brain radiation was administered to most of the patients with brain metastasis in our institution. The survival rate is comparable to previous reviews. Our treatment demonstrated differences from other institutions but the survival is comparable. The limitation of this review is the number of cases is small due to rarity of the disease. Further trials or multicenter analyses may be considered.

Influences on Time and Spatial Characteristics of Soccer Pass Success Rate: A Case Study of the 2018 World Cup in Russia (시간과 공간적 특성에 따른 축구 패스 성공률 분석: 2018 러시아 월드컵 대회 자료를 중심으로)

  • Lee, Seung-Hun;Kim, Young-Hoon
    • Journal of Digital Convergence
    • /
    • v.19 no.1
    • /
    • pp.475-483
    • /
    • 2021
  • The purpose of this study is to identify the temporal and spatial characteristics of pass accuracy by utilizing the second processing data and official records collected from the 2018 FIFA World Cup Russia video data. For a total of 128 games, the success rate of passes based on the results of the game, passing time, and passing position was two-way ANOVA with repeated measure. The results showed no difference between winning and losing groups, and no interaction effects were found for passing time and location. The difference in passing time was high in the first half, with the highest success rate in the middle of the first half (79.2%) and the middle of the second half (77.9%) in the 15~30 minutes and the 60~75 minutes. Pass success rates were in the order of defense-midfield area (83.9%), midfield-attack area (81.7%), defense area (70.6%) and attack area (61.1%). In conclusion, there was no difference in the passing success rate of the winning and losing teams depending on the characteristics of the relative competitive strength of the World Cup games, and it is believed that follow-up research is needed to analyze the game contents rather than the factors of the winning and losing in the future.

A Study on the Analysis of Park User Experiences in Phase 1 and 2 Korea's New Towns with Blog Text Data (블로그 텍스트 데이터를 활용한 1, 2기 신도시 공원의 이용자 경험 분석 연구)

  • Sim, Jooyoung;Lee, Minsoo;Choi, Hyeyoung
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.52 no.3
    • /
    • pp.89-102
    • /
    • 2024
  • This study aims to examine the characteristics of the user experience of New Town neighborhood parks and explore issues that diversify the experience of the parks. In order to quantitatively analyze a large amount of park visitors' experiences, text-based Naver blog reviews were collected and analyzed. Among the Phase 1 and 2 New Towns, the parks with the highest user experience postings were selected for each city as the target of analysis. Blog text data was collected from May 20, 2003, to May 31, 2022, and analysis was conducted targeting Ilsan Lake Park, Bundang Yuldong Park, Gwanggyo Lake Park, and Dongtan Lake Park. The findings revealed that all four parks were used for everyday relaxation and recreation. Second, the analysis underscores park's diverse user groups. Third, the programs for parks nearby were also related to park usage. Fourth, the words within the top 20 rankings represented distinctive park elements or content/programs specific to each park. Lastly, the results of the network analysis delineated four overarching types of park users and the networks of four park user types appeared differently depending on the park. This study provides two implications. First, in addition to the naturalistic characteristics, the differentiation of each park's unique facilities and programs greatly improves public awareness and enriches the individual park experience. Second, if analysis of the context surrounding the park based on spatial information is performed in addition to text analysis, the accuracy of interpretation of text data analysis results could be improved. The results of this study can be used in the planning and designing of parks and greenspaces in the Phase 3 New Towns currently in progress.

Fluid bounding effect on FG cylindrical shell using Hankel's functions of second kind

  • Khaled Mohamed Khedher;Shahzad Ali Chattah;Mohammad Amien Khadimallah;Ikram Ahmad;Muzamal Hussain;Rana Muhammad Akram Muntazir;Mohamed Abdelaziz Salem;Ghulam Murtaza;Faisal Al-Thobiani;Muhammad Naeem Mohsin;Abeera Talib;Abdelouahed Tounsi
    • Advances in nano research
    • /
    • v.16 no.6
    • /
    • pp.565-577
    • /
    • 2024
  • Vibration investigation of fluid-filled functionally graded cylindrical shells with ring supports is studied here. Shell motion equations are framed first order shell theory due to Sander. These equations are partial differential equations which are usually solved by approximate technique. Robust and efficient techniques are favored to get precise results. Employment of the Rayleigh-Ritz procedure gives birth to the shell frequency equation. Use of acoustic wave equation is done to incorporate the sound pressure produced in a fluid. Hankel's functions of second kind designate the fluid influence. Mathematically the integral form of the Langrange energy functional is converted into a set of three partial differential equations. A cylindrical shell is immersed in a fluid which is a non-viscous one. These shells are stiffened by rings in the tangential direction. For isotropic materials, the physical properties are same everywhere where the laminated and functionally graded materials, they vary from point to point. Here the shell material has been taken as functionally graded material. After these, ring supports are located at various positions along the axial direction round the shell circumferential direction. The influence of the ring supports is investigated at various positions. Effect of ring supports with empty and fluid-filled shell is presented using the Rayleigh - Ritz method with simply supported condition. The frequency behavior is investigated with empty and fluid-filled cylindrical shell with ring supports versus circumferential wave number and axial wave number. Also the variations have been plotted against the locations of ring supports for length-to-radius and height-to-radius ratio. Moreover, frequency pattern is found for the various position of ring supports for empty and fluid-filled cylindrical shell. The frequency first increases and gain maximum value in the midway of the shell length and then lowers down. It is found that due to inducting the fluid term frequency result down than that of empty cylinder. It is also exhibited that the effect of frequencies is investigated by varying the surfaces with stainless steel and nickel as a constituent material. To generate the fundamental natural frequencies and for better accuracy and effectiveness, the computer software MATLAB is used.

Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning (영화 흥행에 영향을 미치는 새로운 변수 개발과 이를 이용한 머신러닝 기반의 주간 박스오피스 예측)

  • Song, Junga;Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.67-83
    • /
    • 2018
  • The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film. In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Skew Compensation and Text Extraction of The Traffic Sign in Natural Scenes (자연영상에서 교통 표지판의 기울기 보정 및 덱스트 추출)

  • Choi Gyu-Dam;Kim Sung-Dong;Choi Ki-Ho
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.3 no.2 s.5
    • /
    • pp.19-28
    • /
    • 2004
  • This paper shows how to compensate the skew from the traffic sign included in the natural image and extract the text. The research deals with the Process related to the array image. Ail the process comprises four steps. In the first fart we Perform the preprocessing and Canny edge extraction for the edge in the natural image. In the second pan we perform preprocessing and postprocessing for Hough Transform in order to extract the skewed angle. In the third part we remove the noise images and the complex lines, and then extract the candidate region using the features of the text. In the last part after performing the local binarization in the extracted candidate region, we demonstrate the text extraction by using the differences of the features which appeared between the tett and the non-text in order to select the unnecessary non-text. After carrying out an experiment with the natural image of 100 Pieces that includes the traffic sign. The research indicates a 82.54 percent extraction of the text and a 79.69 percent accuracy of the extraction, and this improved more accurate text extraction in comparison with the existing works such as the method using RLS(Run Length Smoothing) or Fourier Transform. Also this research shows a 94.5 percent extraction in respect of the extraction on the skewed angle. That improved a 26 percent, compared with the way used only Hough Transform. The research is applied to giving the information of the location regarding the walking aid system for the blind or the operation of a driverless vehicle

  • PDF