• Title/Summary/Keyword: Classification Performance

Search Result 3,766, Processing Time 0.03 seconds

Application of Random Over Sampling Examples(ROSE) for an Effective Bankruptcy Prediction Model (효과적인 기업부도 예측모형을 위한 ROSE 표본추출기법의 적용)

  • Ahn, Cheolhwi;Ahn, Hyunchul
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.525-535
    • /
    • 2018
  • If the frequency of a particular class is excessively higher than the frequency of other classes in the classification problem, data imbalance problems occur, which make machine learning distorted. Corporate bankruptcy prediction often suffers from data imbalance problems since the ratio of insolvent companies is generally very low, whereas the ratio of solvent companies is very high. To mitigate these problems, it is required to apply a proper sampling technique. Until now, oversampling techniques which adjust the class distribution of a data set by sampling minor class with replacement have popularly been used. However, they are a risk of overfitting. Under this background, this study proposes ROSE(Random Over Sampling Examples) technique which is proposed by Menardi and Torelli in 2014 for the effective corporate bankruptcy prediction. The ROSE technique creates new learning samples by synthesizing the samples for learning, so it leads to better prediction accuracy of the classifiers while avoiding the risk of overfitting. Specifically, our study proposes to combine the ROSE method with SVM(support vector machine), which is known as the best binary classifier. We applied the proposed method to a real-world bankruptcy prediction case of a Korean major bank, and compared its performance with other sampling techniques. Experimental results showed that ROSE contributed to the improvement of the prediction accuracy of SVM in bankruptcy prediction compared to other techniques, with statistical significance. These results shed a light on the fact that ROSE can be a good alternative for resolving data imbalance problems of the prediction problems in social science area other than bankruptcy prediction.

Improving a Korean Spell/Grammar Checker for the Web-Based Language Learning System (웹기반 언어 학습시스템을 위한 한국어 철자/문법 검사기의 성능 향상)

  • 남현숙;김광영;권혁철
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.3
    • /
    • pp.1-18
    • /
    • 2001
  • The goal of this paper is the pedagogical application of a Korean Spell/Grammar Checker to the web-based language learning system for Korean writing. To maximize the efficient instruction of our learning system \\`Urimal Baeumteo\\` we have to improve our Korean Spell/Grammar Checker. Today the NLP system\\`s performance defends on its semantic processing capability. In our Korean Spell/Grammar Checker. the tasks accomplished in the semantic level are: the detection and correction of misused derived and compound nouns in a Korean spell-checking device and the detection and correction of syntactic and semantic errors in a Korean grammars-checking device. We describe a common approach to the partial parsing using collocation rules based on the dependency grammar. To provide more detailed semantic rules. we classified nouns according to their concepts. and subcategorized verbs referring to their syntactic and semantic features. Improving a Korean Spell/Gl-Grammar Checker makes our learning system active and intelligent in a web-based environment. We acknowledge the flaws in our system: the classification of nouns based on their meanings and concepts is a time consuming task. the analytic unit of this study is principally limited to the phrases in a sentence therefore the accurate parsing of embedded sentences remains a difficult problem to solve. Concerning the web-based language learning system. it is critically important to consider its interface design and structure of its contents.

  • PDF

Performance Evaluation and Forecasting Model for Retail Institutions (유통업체의 부실예측모형 개선에 관한 연구)

  • Kim, Jung-Uk
    • Journal of Distribution Science
    • /
    • v.12 no.11
    • /
    • pp.77-83
    • /
    • 2014
  • Purpose - The National Agricultural Cooperative Federation of Korea and National Fisheries Cooperative Federation of Korea have prosecuted both financial and retail businesses. As cooperatives are public institutions and receive government support, their sound management is required by the Financial Supervisory Service in Korea. This is mainly managed by CAEL, which is changed by CAMEL. However, NFFC's business section, managing the finance and retail businesses, is unified and evaluated; the CAEL model has an insufficient classification to evaluate the retail industry. First, there is discrimination power as regards CAEL. Although the retail business sector union can receive a higher rating on a CAEL model, defaults have often been reported. Therefore, a default prediction model is needed to support a CAEL model. As we have the default prediction model using a subdivision of indexes and statistical methods, it can be useful to have a prevention function through the estimation of the retail sector's default probability. Second, separating the difference between the finance and retail business sectors is necessary. Their businesses have different characteristics. Based on various management indexes that have been systematically managed by the National Fisheries Cooperative Federation of Korea, our model predicts retail default, and is better than the CAEL model in its failure prediction because it has various discriminative financial ratios reflecting the retail industry situation. Research design, data, and methodology - The model to predict retail default was presented using logistic analysis. To develop the predictive model, we use the retail financial statements of the NFCF. We consider 93 unions each year from 2006 to 2012 to select confident management indexes. We also adapted the statistical power analysis that is a t-test, logit analysis, AR (accuracy ratio), and AUROC (Area Under Receiver Operating Characteristic) analysis. Finally, through the multivariate logistic model, we show that it is excellent in its discrimination power and higher in its hit ratio for default prediction. We also evaluate its usefulness. Results - The statistical power analysis using the AR (AUROC) method on the short term model shows that the logistic model has excellent discrimination power, with 84.6%. Further, it is higher in its hit ratio for failure (prediction) of total model, at 94%, indicating that it is temporally stable and useful for evaluating the management status of retail institutions. Conclusions - This model is useful for evaluating the management status of retail union institutions. First, subdividing CAEL evaluation is required. The existing CAEL evaluation is underdeveloped, and discrimination power falls. Second, efforts to develop a varied and rational management index are continuously required. An index reflecting retail industry characteristics needs to be developed. However, extending this study will need the following. First, it will require a complementary default model reflecting size differences. Second, in the case of small and medium retail, it will need non-financial information. Therefore, it will be a hybrid default model reflecting financial and non-financial information.

Cycle Extendability of Torus Sub-Graphs in the Enhanced Pyramid Network (개선된 피라미드 네트워크에서 토러스 부그래프의 사이클 확장성)

  • Chang, Jung-Hwan
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.8
    • /
    • pp.1183-1193
    • /
    • 2010
  • The pyramid graph is well known in parallel processing as a interconnection network topology based on regular square mesh and tree architectures. The enhanced pyramid graph is an alternative architecture by exchanging mesh into the corresponding torus on the base for upgrading performance than the pyramid. In this paper, we adopt a strategy of classification into two disjoint groups of edges in regular square torus as a basic sub-graph constituting of each layer in the enhanced pyramid graph. Edge set in the torus graph is considered as two disjoint sub-sets called NPC(represents candidate edge for neighbor-parent) and SPC(represents candidate edge for shared-parent) whether the parents vertices adjacent to two end vertices of the corresponding edge have a relation of neighbor or sharing in the upper layer of the enhanced pyramid graph. In addition, we also introduce a notion of shrink graph to focus only on the NPC-edges by hiding SPC-edges within the shrunk super-vertex on the resulting shrink graph. In this paper, we analyze that the lower and upper bounds on the number of NPC-edges in a Hamiltonian cycle constructed on $2^n{\times}2^n$ torus is $2^{2n-2}$ and $3{\cdot}2^{2n-2}$ respectively. By expanding this result into the enhanced pyramid graph, we also prove that the maximum number of NPC-edges containable in a Hamiltonian cycle is $4^{n-1}$-2n+1 in the n-dimensional enhanced pyramid.

English Phoneme Recognition using Segmental-Feature HMM (분절 특징 HMM을 이용한 영어 음소 인식)

  • Yun, Young-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.3
    • /
    • pp.167-179
    • /
    • 2002
  • In this paper, we propose a new acoustic model for characterizing segmental features and an algorithm based upon a general framework of hidden Markov models (HMMs) in order to compensate the weakness of HMM assumptions. The segmental features are represented as a trajectory of observed vector sequences by a polynomial regression function because the single frame feature cannot represent the temporal dynamics of speech signals effectively. To apply the segmental features to pattern classification, we adopted segmental HMM(SHMM) which is known as the effective method to represent the trend of speech signals. SHMM separates observation probability of the given state into extra- and intra-segmental variations that show the long-term and short-term variabilities, respectively. To consider the segmental characteristics in acoustic model, we present segmental-feature HMM(SFHMM) by modifying the SHMM. The SFHMM therefore represents the external- and internal-variation as the observation probability of the trajectory in a given state and trajectory estimation error for the given segment, respectively. We conducted several experiments on the TIMIT database to establish the effectiveness of the proposed method and the characteristics of the segmental features. From the experimental results, we conclude that the proposed method is valuable, if its number of parameters is greater than that of conventional HMM, in the flexible and informative feature representation and the performance improvement.

Locally Linear Embedding for Face Recognition with Simultaneous Diagonalization (얼굴 인식을 위한 연립 대각화와 국부 선형 임베딩)

  • Kim, Eun-Sol;Noh, Yung-Kyun;Zhang, Byoung-Tak
    • Journal of KIISE
    • /
    • v.42 no.2
    • /
    • pp.235-241
    • /
    • 2015
  • Locally linear embedding (LLE) [1] is a type of manifold algorithms, which preserves inner product value between high-dimensional data when embedding the high-dimensional data to low-dimensional space. LLE closely embeds data points on the same subspace in low-dimensional space, because the data points have significant inner product values. On the other hand, if the data points are located orthogonal to each other, these are separately embedded in low-dimensional space, even though they are in close proximity to each other in high-dimensional space. Meanwhile, it is well known that the facial images of the same person under varying illumination lie in a low-dimensional linear subspace [2]. In this study, we suggest an improved LLE method for face recognition problem. The method maximizes the characteristic of LLE, which embeds the data points totally separately when they are located orthogonal to each other. To accomplish this, all of the subspaces made by each class are forced to locate orthogonally. To make all of the subspaces orthogonal, the simultaneous Diagonalization (SD) technique was applied. From experimental results, the suggested method is shown to dramatically improve the embedding results and classification performance.

Hybrid Behavior Evolution Model Using Rule and Link Descriptors (규칙 구성자와 연결 구성자를 이용한 혼합형 행동 진화 모델)

  • Park, Sa Joon
    • Journal of Intelligence and Information Systems
    • /
    • v.12 no.3
    • /
    • pp.67-82
    • /
    • 2006
  • We propose the HBEM(Hybrid Behavior Evolution Model) composed of rule classification and evolutionary neural network using rule descriptor and link descriptor for evolutionary behavior of virtual robots. In our model, two levels of the knowledge of behaviors were represented. In the upper level, the representation was improved using rule and link descriptors together. And then in the lower level, behavior knowledge was represented in form of bit string and learned adapting their chromosomes by the genetic operators. A virtual robot was composed by the learned chromosome which had the best fitness. The composed virtual robot perceives the surrounding situations and they were classifying the pattern through rules and processing the result in neural network and behaving. To evaluate our proposed model, we developed HBES(Hybrid Behavior Evolution System) and adapted the problem of gathering food of the virtual robots. In the results of testing our system, the learning time was fewer than the evolution neural network of the condition which was same. And then, to evaluate the effect improving the fitness by the rules we respectively measured the fitness adapted or not about the chromosomes where the learning was completed. In the results of evaluating, if the rules were not adapted the fitness was lowered. It showed that our proposed model was better in the learning performance and more regular than the evolutionary neural network in the behavior evolution of the virtual robots.

  • PDF

Vehicle Recognition with Recognition of Vehicle Identification Mark and License Plate (차량 식별마크와 번호판 인식을 통한 차량인식)

  • Lee Eung-Joo;Kim Sung-Jin;Kwon Ki-Ryong
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.11
    • /
    • pp.1449-1461
    • /
    • 2005
  • In this paper, we propose a vehicle recognition system based on the classification of vehicle identification mark and recognition of vehicle license plate. In the proposed algorithm, From the input vehicle image, we first simulate preprocessing procedures such as noise reduction, thinning etc., and detect vehicle identification mark and license plate region using the frequency distribution of intensity variation. And then, we classify extracted vehicle candidate region into identification mark, character and number of vehicle by using structural feature informations of vehicle. Lastly, we recognize vehicle informations with recognition of identification mark, character and number of vehicle using hybrid and vertical/horizontal pattern vector method. In the proposed algorithm, we used three properties of vehicle informations such as Independency property, discriminance property and frequency distribution of intensity variation property. In the vehicle images, identification mark is generally independent of the types of vehicle and vehicle identification mark. And also, the license plate region between character and background as well as horizontal/vertical intensity variations are more noticeable than other regions. To show the efficiency of the propofed algorithm, we tested it on 350 vehicle images and found that the propofed method shows good Performance regardless of irregular environment conditions as well as noise, size, and location of vehicles.

  • PDF

Comparative Analysis of Korean Universities' Co-author Credit Allocation Standards on Journal Publications (국내대학의 학술논문 공동연구 기여도 산정 기준 비교 분석)

  • Lee, Hyekyung;Yang, Kiduk
    • Journal of Korean Library and Information Science Society
    • /
    • v.46 no.4
    • /
    • pp.191-205
    • /
    • 2015
  • As the first step in developing the optimal co-authorship allocation method, this study investigated the co-authorship allocation standards of Korean Universities on journal publications. The study compared the standards of 27 Korean universities with Library and Information Science (LIS) departments, and analyzed author rankings generated by applying inflated, fractional, harmonic, and university standard method of co-authorship allocation to 189 Korean LIS faculty publications from 2001 to 2014. The university standards most similar to the standard co-authorship allocation method in bibliometrics(i.e. Vinkler) were those whose co-author credits summed up to 1. However, the university standards differed from Vinkler's in allocating author credits based on primary and secondary author classification instead of allocation based on author ranks. The statistical analysis of author rankings showed that the harmonic method was most similar to the university standards. However, the correlation between the university standards whose co-author credits summed up to greater than 1 and harmonic method was lower. The study results also suggested that middle-level authors are most sensitive to co-authorship allocation methods. However, even the most generous university standards of co-authorship allocation still penalizes collaborative research by reducing each co-authors credit below those of single authors. Follow-up studies will be needed to investigate the optimal method of co-authorship credit allocation.

Filter Selection Method Using CSP and LDA for Filter-bank based BCI Systems (필터 뱅크 기반 BCI 시스템을 위한 CSP와 LDA를 이용한 필터 선택 방법)

  • Park, Geun-Ho;Lee, Yu-Ri;Kim, Hyoung-Nam
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.5
    • /
    • pp.197-206
    • /
    • 2014
  • Motor imagery based Brain-computer Interface(BCI), which has recently attracted attention, is the technique for decoding the user's voluntary motor intention using Electroencephalography(EEG). For classifying the motor imagery, event-related desynchronization(ERD), which is the phenomenon of EEG voltage drop at sensorimotor area in ${\mu}$-band(8-13Hz), has been generally used but this method are not free from the performance degradation of the BCI system because EEG has low spatial resolution and shows different ERD-appearing band according to users. Common spatial pattern(CSP) was proposed to solve the low spatial resolution problem but it has a disadvantage of being very sensitive to frequency-band selection. Discriminative filter bank common spatial pattern(DFBCSP) tried to solve the frequency-band selection problem by using the Fisher ratio of the averaged EEG signal power and establishing discriminative filter bank(DFB) which only includes the feature frequency-band. However, we found that DFB might not include the proper filters showing the spatial pattern of ERD. To solve this problem, we apply a band-selection process using CSP feature vectors and linear discriminant analysis to DFBCSP instead of the averaged EEG signal power. The filter selection results and the classification accuracies of the existing and the proposed methods show that the CSP feature is more effective than signal power feature.