• 제목/요약/키워드: Mean vector

Search Result 707, Processing Time 0.402 seconds

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

The Improvement of Convergence Characteristic using the New RLS Algorithm in Recycling Buffer Structures

  • Kim, Gwang-Jun;Kim, Chun-Suck
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.7 no.4
    • /
    • pp.691-698
    • /
    • 2003
  • We extend the sue of the method of least square to develop a recursive algorithm for the design of adaptive transversal filters such that, given the least-square estimate of this vector of the filter at iteration n-l, we may compute the updated estimate of this vector at iteration n upon the arrival of new data. We begin the development of the RLS algorithm by reviewing some basic relations that pertain to the method of least squares. Then, by exploiting a relation in matrix algebra known as the matrix inversion lemma, we develop the RLS algorithm. An important feature of the RLS algorithm is that it utilizes information contained in the input data, extending back to the instant of time when the algorithm is initiated. In this paper, we propose new tap weight updated RLS algorithm in adaptive transversal filter with data-recycling buffer structure. We prove that convergence speed of learning curve of RLS algorithm with data-recycling buffer is faster than it of exiting RLS algorithm to mean square error versus iteration number. Also the resulting rate of convergence is typically an order of magnitude faster than the simple LMS algorithm. We show that the number of desired sample is portion to increase to converge the specified value from the three dimension simulation result of mean square error according to the degree of channel amplitude distortion and data-recycle buffer number. This improvement of convergence character in performance, is achieved at the B times of convergence speed of mean square error increase in data recycle buffer number with new proposed RLS algorithm.

Time- and Frequency-Domain Block LMS Adaptive Digital Filters: Part Ⅱ - Performance Analysis (시간영역 및 주파수영역 블럭적응 여파기에 관한 연구 : 제 2 부- 성능분석)

  • Lee, Jae-Chon;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.7 no.4
    • /
    • pp.54-76
    • /
    • 1988
  • In Part Ⅰ of the paper, we have developed various block least mean-square (BLMS) adaptive digital filters (ADF's) based on a unified matrix treatment. In Part Ⅱ we analyze the convergence behaviors of the self-orthogonalizing frequency-domain BLMS (FBLMS) ADF and the unconstrained FBLMS (UFBLMS) ADF both for the overlap-save and overlap-add sectioning methods. We first show that, unlike the FBLMS ADF with a constant convergence factor, the convergence behavior of the self-orthogonalizing FBLMS ADF is governed by the same autocorrelation matrix as that of the UFBLMS ADF. We then show that the optimum solution of the UFBLMS ADF is the same as that of the constrained FBLMS ADF when the filter length is sufficiently long. The mean of the weight vector of the UFBLMS ADF is also shown to converge to the optimum Wiener weight vector under a proper condition. However, the steady-state mean-squared error(MSE) of the UFBLMS ADF turns out to be slightly worse than that of the constrained algorithm if the same convergence constant is used in both cases. On the other hand, when the filter length is not sufficiently long, while the constrained FBLMS ADF yields poor performance, the performance of the UFBLMS ADF can be improved to some extent by utilizing its extended filter-length capability. As for the self-orthogonalizing FBLMS ADF, we study how we can approximate the autocorrelation matrix by a diagonal matrix in the frequency domain. We also analyze the steady-state MSE's of the self-orthogonalizing FBLMS ADF's with and without the constant. Finally, we present various simulation results to verify our analytical results.

  • PDF

The Design of Feature Selection Classifier based on Physiological Signal for Emotion Detection (감성판별을 위한 생체신호기반 특징선택 분류기 설계)

  • Lee, JeeEun;Yoo, Sun K.
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.11
    • /
    • pp.206-216
    • /
    • 2013
  • The emotion plays a critical role in human's daily life including learning, action, decision and communication. In this paper, emotion discrimination classifier is designed to reduce system complexity through reduced selection of dominant features from biosignals. The photoplethysmography(PPG), skin temperature, skin conductance, fontal and parietal electroencephalography(EEG) signals were measured during 4 types of movie watching associated with the induction of neutral, sad, fear joy emotions. The genetic algorithm with support vector machine(SVM) based fitness function was designed to determine dominant features among 24 parameters extracted from measured biosignals. It shows maximum classification accuracy of 96.4%, which is 17% higher than that of SVM alone. The minimum error features selected are the mean and NN50 of heart rate variability from PPG signal, the mean of PPG induced pulse transit time, the mean of skin resistance, and ${\delta}$ and ${\beta}$ frequency band powers of parietal EEG. The combination of parietal EEG, PPG, and skin resistance is recommendable in high accuracy instrumentation, while the combinational use of PPG and skin conductance(79% accuracy) is affordable in simplified instrumentation.

Block Matching Motion Estimation Using Fast Search Algorithm (고속 탐색 알고리즘을 이용한 블록정합 움직임 추정)

  • 오태명
    • Journal of the Korean Institute of Telematics and Electronics T
    • /
    • v.36T no.3
    • /
    • pp.32-40
    • /
    • 1999
  • In this paper, we present a fast block matching motion estimation algorithm based on successive elimination algorithm (SEA). Based on the characteristic of center-biased motion vector distribution in the search area, the proposed method improves the performance of the SEA with a reduced the number of the search positions in the search area, In addition, to reduce the computational load, this method is combined with both the reduced bits mean absolute difference (RBMAD) matching criterion which can be reduced the computation complexity of pixel comparison in the block matching and pixel decimation technique which reduce the number of pixels used in block matching. Simulation results show that the proposed method provides better performance than existing fast algorithms and similar to full-search block motion estimation algorithm.

  • PDF

Temperature Classification of Heat-treated Metals using Pattern Recognition of Ultrasonic Signal (초음파 신호의 패턴 인식에 의한 금속의 열처리 온도 분류)

  • Im, Rae-Muk;Sin, Dong-Hwan;Kim, Deok-Yeong;Kim, Seong-Hwan
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.12
    • /
    • pp.1544-1553
    • /
    • 1999
  • Recently, ultrasonic testing techniques have been widely used in the evaluation of the quality of metal. In this experiment, six heat-treated temperature of specimen have been considered : 0, 1200, 1250, 1300, 1350 and 1387$^{\circ}C$. As heat-treated temperature increases, the grain size of stainless steel also increases and then, eventually make it destroy. In this paper, a pattern recognition method is proposed to identify the heat-treated temperature of metals by evidence accumulation based on artificial intelligence with multiple feature parameters; difference absolute mean value(DAMV), variance(VAR), mean frequency(MEANF), auto regressive model coefficient(ARC), linear cepstrum coefficient(LCC) and adaptive cepstrum vector(ACV). The grain signal pattern recognition is carried out through the evidence accumulation procedure using the distances measured with reference parameters. Especially ACV is superior to the other parameters. The results (96% successful pattern classification) are presented to support the feasibility of the suggested approach for ultrasonic grain signal pattern recognition.

  • PDF

A Novel Equivalent Wiener-Hopf Equation with TDL coefficient in Lattice Structure

  • Cho, Ju-Phil;Ahn, Bong-Man;Hwang, Jee-Won
    • Journal of information and communication convergence engineering
    • /
    • v.9 no.5
    • /
    • pp.500-504
    • /
    • 2011
  • In this paper, we propose an equivalent Wiener-Hopf equation. The proposed algorithm can obtain the weight vector of a TDL(tapped-delay-line) filter and the error simultaneously if the inputs are orthogonal to each other. The equivalent Wiener-Hopf equation was analyzed theoretically based on the MMSE(minimum mean square error) method. The results present that the proposed algorithm is equivalent to original Wiener-Hopf equation. The new algorithm was applied into the identification of an unknown system for evaluating the performance of the proposed method. We compared the Wiener-Hopf solution with the equivalent Wiener-Hopf solution. The simulation results were similar to those obtained in the theoretical analysis. In conclusion, our method can find the coefficient of the TDL (tapped-delay-line) filter where a lattice filter is used, and also when the process of Gram-Schmidt orthogonalization is used. Furthermore, a new cost function is suggested which may facilitate research in the adaptive signal processing area.

A Study on Flow Characteristics Behind the Bluff Body Using the PIV (PIV를 이용한 단순물체 후류의 유동특성에 관한 연구)

  • Choe, Sang-Bom;Cho, Dae-Hwan;Choi, Joo-Yol
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.35 no.1
    • /
    • pp.89-95
    • /
    • 2011
  • In this study, We modeled the deck house of the container ship like the representative bluff body and made the model ship. By using the PIV technique, the exhaust gas anti-reflux effect of the deck house backward according to open and close of the Sunken Deck and installation of the deflector in deck house side were measured in circulating water channel. The experiment system consists of hi-speed camera, laser, image board, host computer. The mean velocity vector and time mean axial velocity were found in deck house backward and the results were compared each case.

Classification of Sleep/Wakefulness using Nasal Pressure for Patients with Sleep-disordered Breathing (비강압력신호를 이용한 수면호흡장애 환자의 수면/각성 분류)

  • Park, Jong-Uk;Jeoung, Pil-Soo;Kang, Kyu-Min;Lee, Kyoung-Joung
    • Journal of Biomedical Engineering Research
    • /
    • v.37 no.4
    • /
    • pp.127-133
    • /
    • 2016
  • This study proposes the feasibility for automatic classification of sleep/wakefulness using nasal pressure in patients with sleep-disordered breathing (SDB). First, SDB events were detected using the methods developed in our previous studies. In epochs for normal breathing, we extracted the features for classifying sleep/wakefulness based on time-domain, frequency-domain and non-linear analysis. And then, we conducted the independent two-sample t-test and calculated Mahalanobis distance (MD) between the two categories. As a results, $SD_{LEN}$ (MD = 0.84, p < 0.01), $P_{HF}$ (MD = 0.81, p < 0.01), $SD_{AMP}$ (MD = 0.76, p = 0.031) and $MEAN_{AMP}$ (MD = 0.75, p = 0.027) were selected as optimal feature. We classified sleep/wakefulness based on support vector machine (SVM). The classification results showed mean of sensitivity (Sen.), specificity (Spc.) and accuracy (Acc.) of 60.5%, 89.0% and 84.8% respectively. This method showed the possibilities to automatically classify sleep/wakefulness only using nasal pressure.

TIME SERIES PREDICTION USING INCREMENTAL REGRESSION

  • Kim, Sung-Hyun;Lee, Yong-Mi;Jin, Long;Chai, Duck-Jin;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.635-638
    • /
    • 2006
  • Regression of conventional prediction techniques in data mining uses the model which is generated from the training step. This model is applied to new input data without any change. If this model is applied directly to time series, the rate of prediction accuracy will be decreased. This paper proposes an incremental regression for time series prediction like typhoon track prediction. This technique considers the characteristic of time series which may be changed over time. It is composed of two steps. The first step executes a fractional process for applying input data to the regression model. The second step updates the model by using its information as new data. Additionally, the model is maintained by only recent data in a queue. This approach has the following two advantages. It maintains the minimum information of the model by using a matrix, so space complexity is reduced. Moreover, it prevents the increment of error rate by updating the model over time. Accuracy rate of the proposed method is measured by RME(Relative Mean Error) and RMSE(Root Mean Square Error). The results of typhoon track prediction experiment are performed by the proposed technique IMLR(Incremental Multiple Linear Regression) is more efficient than those of MLR(Multiple Linear Regression) and SVR(Support Vector Regression).

  • PDF