• Title/Summary/Keyword: Arithmetic Mean

Search Result 347, Processing Time 0.027 seconds

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

Lung cancer, chronic obstructive pulmonary disease and air pollution (대기오염에 의한 폐암 및 만성폐색성호흡기질환 -개인 흡연력을 보정한 만성건강영향평가-)

  • Sung, Joo-Hon;Cho, Soo-Hun;Kang, Dae-Hee;Yoo, Keun-Young
    • Journal of Preventive Medicine and Public Health
    • /
    • v.30 no.3 s.58
    • /
    • pp.585-598
    • /
    • 1997
  • Background : Although there are growing concerns about the adverse health effect of air pollution, not much evidence on health effect of current air pollution level had been accumulated yet in Korea. This study was designed to evaluate the chronic health effect of ai. pollution using Korean Medical Insurance Corporation (KMIC) data and air quality data. Medical insurance data in Korea have some drawback in accuracy, but they do have some strength especially in their national coverage, in having unified ID system and individual information which enables various data linkage and chronic health effect study. Method : This study utilized the data of Korean Environmental Surveillance System Study (Surveillance Study), which consist of asthma, acute bronchitis, chronic obstructive pulmonary diseases (COPD), cardiovascular diseases (congestive heart failure and ischemic heart disease), all cancers, accidents and congenital anomaly, i. e., mainly potential environmental diseases. We reconstructed a nested case-control study wit5h Surveillance Study data and air pollution data in Korea. Among 1,037,210 insured who completed? questionnaire and physical examination in 1992, disease free (for chronic respiratory disease and cancer) persons, between the age of 35-64 with smoking status information were selected to reconstruct cohort of 564,991 persons. The cohort was followed-up to 1995 (1992-5) and the subjects who had the diseases in Surveillance Study were selected. Finally, the patients, with address information and available air pollution data, left to be 'final subjects' Cases were defined to all lung cancer cases (424) and COPD admission cases (89), while control groups are determined to all other patients than two case groups among 'final subjects'. That is, cases are putative chronic environmental diseases, while controls are mainly acute environmental diseases. for exposure, Air quality data in 73 monitoring sites between 1991 - 1993 were analyzed to surrogate air pollution exposure level of located areas (58 areas). Five major air pollutants data, TSP, $O_3,\;SO_2$, CO, NOx was available and the area means were applied to the residents of the local area. 3-year arithmetic mean value, the counts of days violating both long-term and shot-term standards during the period were used as indices of exposure. Multiple logistic regression model was applied. All analyses were performed adjusting for current and past smoking history, age, gender. Results : Plain arithmetic means of pollutants level did not succeed in revealing any relation to the risk of lung cancer or COPD, while the cumulative counts of non-at-tainment days did. All pollutants indices failed to show significant positive findings with COPD excess. Lung cancer risks were significantly and consistently associated with the increase of $O_3$ and CO exceedance counts (to corrected error level -0.017) and less strongly and consistently with $SO_2$ and TSP. $SO_2$ and TSP showed weaker and less consistent relationship. $O_3$ and CO were estimated to increase the risks of lung cancer by 2.04 and 1.46 respectively, the maximal probable risks, derived from comparing more polluted area (95%) with cleaner area (5%). Conclusions : Although not decisive due to potential misclassication of exposure, these results wert drawn by relatively conservative interpretation, and could be used as an evidence of chronic health effect especially for lung cancer. $O_3$ might be a candidate for promoter of lung cancer, while CO should be considered as surrogated measure of motor vehicle emissions. The control selection in this study could have been less appropriate for COPD, and further evaluation with another setting might be necessary.

  • PDF

Analysis of Microsatellite Loci for Swimming Crab Portunus trituberculatus Populations in the Korean Side of the Yellow Sea (서해안에서 채집된 꽃게(Portunus trituberculatus) 집단에 대한 microsatellite 좌위의 분석)

  • Lee, Hye Jin;Yoon, Seong Jong;Hyun, Young Se;Kim, Hye Jin;Hwang, Sung-Il;Bae, Joo-Seung;Chung, Ki Wha
    • Journal of Life Science
    • /
    • v.23 no.9
    • /
    • pp.1088-1095
    • /
    • 2013
  • The swimming crab, Portunus trituberculatus, inhabits seafloor habitats containing sand or pebbles and is widely distributed throughout the world. The present study investigated genetic polymorphisms of 10 microsatellites in 281 samples of P. trituberculatus collected from four locations along the coastal water of the Korean side of the Yellow Sea (Yeonggwang, Taean, Sorea, and Yeonpyeong-do Island). The number of alleles per locus ranged from 50 to 129, with a mean of 69.5. The observed and expected hetrozygosity varied from 0.111 to 1.000 and from 0.609 to 0.979, respectively. The inbreeding coefficients (Fis) varied among the loci from -0.0207 to 0.8175. The genetic differentiation (Fst) was less than 0.05 (range 0.0020-0.0124). Therefore, the four groups of P. trituberculatus appeared to exhibit little genetic differentiation. The lack of differentiation was confirmed in a phylogenetic tree constructed by the unweighted pair group method with the arithmetic average (UPGMA). The hypervariation between the populations and the lack of genetic differentiation may reflect active gene flow among the Yellow Sea populations and the absence of geographical boundaries. The highly polymorphic microsatellite loci will be useful for molecular and phylogenetic studies, as well as stock management, of swimming crab, which is an important fishery resource.

Rainfall Variations of Temporal Characteristics of Korea Using Rainfall Indicators (강수지표를 이용한 우리나라 강수량의 시간적인 특성 변화)

  • Hong, Seong-Hyun;Kim, Young-Gyu;Lee, Won-Hyun;Chung, Eun-Sung
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.4
    • /
    • pp.393-407
    • /
    • 2012
  • This study suggests the results of temporal and spatial variations for rainfall data in the Korean Peninsula. We got the index of the rainfall amount, frequency and extreme indices from 65 weather stations. The results could be easily understood by drawing the graph, and the Mann-Kendall trend analysis was also used to determine the tendency (up & downward/no trend) of rainfall and temperature where the trend could not be clear. Moreover, by using the FARD, frequency probability rainfalls could be calculated for 100 and 200 years and then compared each other value through the moment method, maximum likelihood method and probability weighted moments. The Average Rainfall Index (ARI) which is meant comprehensive rainfalls risk for the flood could be obtained from calculating an arithmetic mean of the RI for Amount (RIA), RI for Extreme (RIE), and RI for Frequency (RIF) and as well as the characteristics of rainfalls have been mainly classified into Amount, Extremes, and Frequency. As a result, these each Average Rainfall Indices could be increased respectively into 22.3%, 26.2%, and 5.1% for a recent decade. Since this study showed the recent climate change trend in detail, it will be useful data for the research of climate change adaptation.

Detection of Gradual Transitions in MPEG Compressed Video using Hidden Markov Model (은닉 마르코프 모델을 이용한 MPEG 압축 비디오에서의 점진적 변환의 검출)

  • Choi, Sung-Min;Kim, Dai-Jin;Bang, Sung-Yang
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.3
    • /
    • pp.379-386
    • /
    • 2004
  • Video segmentation is a fundamental task in video indexing and it includes two kinds of shot change detections such as the abrupt transition and the gradual transition. The abrupt shot boundaries are detected by computing the image-based distance between adjacent frames and comparing this distance with a pre-determined threshold value. However, the gradual shot boundaries are difficult to detect with this approach. To overcome this difficulty, we propose the method that detects gradual transition in the MPEG compressed video using the HMM (Hidden Markov Model). We take two different HMMs such as a discrete HMM and a continuous HMM with a Gaussian mixture model. As image features for HMM's observations, we use two distinct features such as the difference of histogram of DC images between two adjacent frames and the difference of each individual macroblock's deviations at the corresponding macroblock's between two adjacent frames, where deviation means an arithmetic difference of each macroblock's DC value from the mean of DC values in the given frame. Furthermore, we obtain the DC sequences of P and B frame by the first order approximation for a fast and effective computation. Experiment results show that we obtain the best detection and classification performance of gradual transitions when a continuous HMM with one Gaussian model is taken and two image features are used together.

Studies on Benzo(a)pyrene of the Suspended Particulate in Atmosphere of Seoul City (서울시(市) 대기중(大氣中) 유해(有害) 부유분진(浮遊粉塵) 성분(成分)에 관(關)한 조사연구(調査硏究) -부유분진중(浮遊粉塵中)의 Benzo(a)pyrene에 관(關)하여-)

  • Kwon, Sook-Pyo;Chung, Yong;Lim, Dong-Koo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.11 no.1
    • /
    • pp.65-75
    • /
    • 1978
  • This study was carried out to investigate air pollution by total suspended particulate(T.S.P.), benzene soluble matter and benzo(a) pyrene in Seoul city. The sampling areas were divided into commercial(Kwang Hwa Moon), indus-trial(Ku Ro Dong) and residensial area(Shin Chon). Sampling was undertaken by High Voulme Air Sampler for four seasons from January 1917 to November 1977. The T.S.P. was extracted with Soxhlet apparatus by benzene and benzo(a) pyrene was separated by column chromatography and thin layer chromatography. The concentrations of benzo(a) pyrene were measured by means of fluorophotometer, and following results were obtained. 1. Arithmetic average concentration for 1-day averaging time of total suspended particulate were $275.6ug/m^3$ in Kwang Hwa Moon, $325.9ug/m^3$ in Ku Ro Dong and $193.0ug/m^3$ in Shin Chon. 2. The seasonal variance of total suspended parti-culate at Ku Ro Dong and Shin Chon were $102.7ug/m^3\;99.6ug/m^3$ respectively and at Kwang Hwa Moon $39.9ug/m^3$. And the concentration of autumn is higher than of that spring at Ku Ro Dong and at Shin Chon, but at Kwang Hwa Moon, the seasonal variance is very little. 3. The concentrations of 50% frequency from geometric mean for 1-day averaging time were $264ug/m^3,\;300ug/m^3\;and\;178ug/m^3$ at Kwang Hwa Moon, Ku Ro Dong and Shin Chon. And geometric standard deviation were 1.27, 1.38 and 1.41 respectively. 4. The concentrations of benzene soluble mater were $26.9ug/m^3$ at Kwang Hwa Moon, $22.7ug/m^3$ at Ku Ro Dong and $15.5ug/m^3$ at Shin Chon, and the ratios to the T.S.P. were 9.8%(range 5.6-14.8%), 7.0%(range 2.4-14.4%) and 8.0%(range 5.5-22.1%) respectively. 5. The concentrations of benzo(a) pyrene were $8.5ug/m^3$ (range $0.8-29.9ug/m^3$) at Kwang Hwa Moon $10.9ug/m^3$(range $1.1-52.0ug/m^3$) at Ku Ro Dong and $5.8ug/m^3$(range $1.5-11.4ug/m^3$) at Shin Chon. 6. The results of this investigation were relatively high in compared with the recommended standards of suspended particulate in air of U.S. Environmental Protection Agency and observed levels of benzo(a)-pyrene in U.S. city.

  • PDF

Study on the shrinkage properties of commercial hardwoods (유용(有用) 활엽수재(闊葉樹材)의 수축(收縮)에 관(關)한 연구(硏究))

  • Kim, Young-Suk;Lee, Won-Yong
    • Journal of the Korean Wood Science and Technology
    • /
    • v.4 no.1
    • /
    • pp.8-14
    • /
    • 1976
  • The capacity of wood to shrinkage is very important as the basis data for wood industry but there is no such data available as yet in Korea. So this article, as a study on forest biological, were made to determine the shrinkage properties of commercial hardwoods in Korea. The results of this study were as follows; 1) There are much difference of hardwoods shrinkage, generally values of heavy wood's shrinkage were larger than that of light wood's shrinkage. 2) The arithmetic; mean values of hardwoods shrinkage were respectively 9.03% on tangential, 4.09% on radial and 0.37% on longitudinal direction and its ratio of at : ar : al was appeared to be 10 : 5.5 : 0.4 3) Average shrinkage per 1% of moisture content was different due to the direction and species. 4) According to the increase of specific gravity the values of shrinkage increased. 5) It was recognized that the shrinkage of hardwoods had a tendency to decrease as increase of annual ring width of wood. 6) The shrinkage of tangential direction was in propertion to the shrinkage of radial direction.

  • PDF

Analysis of Genetic Relationship by RAPD Technique for Codonopsis lanceolata Trauty Collected from the Baekdoo Mountain and Korea (백두산지역과 국내 더덕 수집종의 RAPD에 의한 유연관계 분석)

  • Doo, Hong-Soo;Ryu, Jeom-Ho;Lee, Kang-Soo;Li, Hu Lin;Liu, Xian Hu
    • Korean Journal of Medicinal Crop Science
    • /
    • v.10 no.3
    • /
    • pp.194-199
    • /
    • 2002
  • Extracted genomic DNA from 16 accessions of Codonopsis lanceolata collected from South Korea and the Baekdoo Mt. areas of China were analyzed for their genetic relationships by RAPD. Twenty 10-mer-oligonucleotide primers having reproductive polymorphism were selected for the RAPD analysis. The size of amplified DNA was almost between 125 bp and 2.0 kbp. Sixteen collected Codonopsis lanceolata were analyzed with 20 primers which generated 73(49.3%) polymorphic bands among 148 PCR products. The mean number of polymorphic bands were 7.4 and varied $1{\sim}9$ per primer. It was, thus, demonstrated that RAPD was useful for detecting polymorphism in Codonopsis lanceolata. The range of 1-F value(genetic similarity) was from 0.682 to 0.959. These results indicate variable genetic similarities. By UPGMA (Unweighted Pair Group Method using an Arithmetic average) cluster analysis based on 1-F value, genetic distance among the 16 collected Codonopsis lanceolata was $0.133{\sim}0.400$. It was certainly classified into two groups between collected accessions from Korea and China, and the genetic distance was about 0.281. Both accessions collected from Korea and China showed miner differences, while the genetic relationships of Tonghua Xian and Liuhe Xian from China was farthest with other accessions collected.

Estimating Benzene Exposure Level over Time and by Industry Type through a Review of Literature on Korea

  • Park, Donguk;Choi, Sangjun;Ha, Kwonchul;Jung, Hyejung;Yoon, Chungsik;Koh, Dong-Hee;Ryu, Seunghun;Kim, Soogeun;Kang, Dongmug;Yoo, Kyemook
    • Safety and Health at Work
    • /
    • v.6 no.3
    • /
    • pp.174-183
    • /
    • 2015
  • The major purpose of this study is to construct a retrospective exposure assessment for benzene through a review of literature on Korea. Airborne benzene measurements reported in 34 articles were reviewed. A total of 15,729 individual measurements were compiled. Weighted arithmetic means [AM(w)] and their variance calculated across studies were summarized according to 5-year period intervals (prior to the 1970s through the 2010s) and industry type. Industries were classified according to Korea Standard Industrial Classification (KSIC) using information provided in the literature. We estimated quantitative retrospective exposure to benzene for each cell in the matrix through a combination of time and KSIC. Analysis of the AM(w) indicated reductions in exposure levels over time, regardless of industry, with mean levels prior to the 1980-1984 period of 50.4 ppm (n = 2,289), which dropped to 2.8 ppm (n = 305) in the 1990-1994 period, and to 0.1 ppm (n = 294) in the 1995-1999 period. There has been no improvement since the 2000s, when the AM(w) of 4.3 ppm (n = 6,211) for the 2005-2009 period and 4.5 ppm (n = 3,358) for the 2010-2013 period were estimated. A comparison by industry found no consistent patterns in the measurement results. Our estimated benzene measurements can be used to determine not only the possibility of retrospective exposure to benzene, but also to estimate the level of quantitative or semiquantitative retrospective exposure to benzene.

An improvement plan of Curriculum in Departments of Dental Technology (치기공과 교육과정의 개선방안)

  • Bae, Bong-Jin;Lee, Hwa-Sik;Park, Myung-Ho
    • Journal of Technologic Dentistry
    • /
    • v.31 no.4
    • /
    • pp.55-66
    • /
    • 2009
  • This research collected the curriculum for Dental Technology from a total of 20 schools --3-year colleges and 4-year colleges-- all in Korea. And we analyzed the average credits of subjects from students. As a result of this analysis, we get the conclusion below: 1. In the arithmetic mean of the major basis subjects which graduates and undergraduates answered about each subjects; Seminar, Dental morphology I II, Dental morphology practice I II, and Dental devices & instruments don't have many credits. And averages of the major application subjects credits which are Implants(especially low), Occlusal anatomy practice I II, Dental ceramics practice I II, and Dental ceramics practice are low, mostly have a converged tendency in high points. 2. In an analysis of the correlation which is based on the major basis subjects: Dental esthetic, oral anatomy I II, Dental materal practice III, Dental casting pracedure, Oral hygiene, Health & medical law, Management administration, and Medical terminology have a meaningful difference. (${\rho}$ < 0.05) 3. In an analysis of the correlation which is based on the major application subjects; Crown and bridge prosthodontics practice IV, Complete denture prosthodontics I II III, Complete denture prosthodontics practice I II III, Dental ceramics I II, Dental ceramics practice I II, Dental ceramics practice IIII, Occlusal anatomy I II, Occlusal anatomy practice I, Operative dentistry laboratory technology I, Operative dentistry laboratory technology practice II, Dental attachment laboratory technology practice, Implants, and Dental laboratory clinical practice have meaningful difference. (${\rho}$ < 0.05) 4. In an analysis of the correlation which is based on the ratio of a theory to an actual training; 40:60(38.57%) is the highest, followed by 30:70(30.04%), 50:50(23.32%), 60:40(5.83%), and 70:30(2.24%). These have meaningful difference. (${\rho}$ < 0.05) 5. In an analysis of the correlation which is based on the distinction of sex: Partial denture prosthodontics practice I II III, Complete denture prosthodontics I II III, Complete denture prosthodontics practice I II III, Occlusal anatomy practice I II, Implants, Medical terminology have meaningful difference. (${\rho}$ < 0.05) For the purpose of training entrepreneurs of middle standing who is required by a future society, Department of Dental Technology's Curriculum need to be managed with planning a curriculum which reflects opinions of graduates, undergraduates and a society, and also are considered not focusing on a supplier but focusing on a user.

  • PDF