• Title/Summary/Keyword: misclassification

Search Result 231, Processing Time 0.022 seconds

Hierarchically penalized support vector machine for the classication of imbalanced data with grouped variables (그룹변수를 포함하는 불균형 자료의 분류분석을 위한 서포트 벡터 머신)

  • Kim, Eunkyung;Jhun, Myoungshic;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.961-975
    • /
    • 2016
  • The hierarchically penalized support vector machine (H-SVM) has been developed to perform simultaneous classification and input variable selection when input variables are naturally grouped or generated by factors. However, the H-SVM may suffer from estimation inefficiency because it applies the same amount of shrinkage to each variable without assessing its relative importance. In addition, when analyzing imbalanced data with uneven class sizes, the classification accuracy of the H-SVM may drop significantly in predicting minority class because its classifiers are undesirably biased toward the majority class. To remedy such problems, we propose the weighted adaptive H-SVM (WAH-SVM) method, which uses a adaptive tuning parameters to improve the performance of variable selection and the weights to differentiate the misclassification of data points between classes. Numerical results are presented to demonstrate the competitive performance of the proposed WAH-SVM over existing SVM methods.

People Counting Method using Moving and Static Points of Interest (동적 및 정적 관심점을 이용하는 사람 계수 기법)

  • Gil, Jong In;Mahmoudpour, Saeed;Whang, Whan-Kyu;Kim, Manbae
    • Journal of Broadcast Engineering
    • /
    • v.22 no.1
    • /
    • pp.70-77
    • /
    • 2017
  • Among available people counting methods, map-based approaches based on moving interest points have shown good performance. However, the stationary people counting is challenging in such methods since all static points of interest are considered as background. To include stationary people in counting, it is needed to discriminate between the static points of stationary people and the background region. In this paper, we propose a people counting method based on using both moving and static points. The proposed method separates the moving and static points by motion information. Then, the static points of the stationary people are classified using foreground mask processing and point pattern analysis. The experimental results reveal that the proposed method provides more accurate count estimation by including stationary people. Also, the background updating is enabled to solve the static point misclassification problem due to background changes.

The Classification of Congestion and Wireless Losses for TCP Segments Using ROTT (상대전송지연시간을 이용한 TCP 세그먼트의 혼잡 손실과 무선 손실 구분 알고리즘)

  • Shin, Kwang-Sik;Lee, Bo-Ram;Kim, Ki-Won;Jang, Mun-Suck;Yoon, Wan-Oh;Choi, Sang-Bang
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.32 no.8A
    • /
    • pp.858-870
    • /
    • 2007
  • TCP is popular protocol for reliable data delivery in the Internet. In recent years, wireless environments with transmission errors are becoming more common. Therefore, there is significant interest in using TCP over wireless links. Previous works have shown that, unless the protocol is modified, TCP may perform poorly on paths that include a wireless link subject to transmission errors. The reason for this is the implicit assumption in TCP that all packet losses are due to congestion which causes unnecessary reduction of transmission rate when the cause of packet losses are wireless transmission errors. In this paper, we propose a new LDA that monitors the network congestion level using ROTT. And we evaluate the performance of our scheme and compare with TCP Veno, Spike scheme with NS2(Network Simulator 2). In the result of our experiment, our scheme reduces the packet loss misclassification to maximum 55% of other schemes. And the results of another simulation show that our scheme raise its transmission rate with the fairness preserved.

Extraction of the aquaculture farms information from the Landsat- TM imagery of the Younggwang coastal area

  • Shanmugam, P.;Ahn, Yu-Hwan;Yoo, Hong-Ryong
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2004.03a
    • /
    • pp.493-498
    • /
    • 2004
  • The objective of the present study is to compare various conventional and recently evolved satellite image-processing techniques and to ascertain the best possible technique that can identify and position of aquaculture farms accurately in and around the Younggwang coastal area. Several conventional techniques performed to extract such information fiom the Landsat-TM imagery do not seem to yield better information about the aquaculture farms, and lead to misclassification. The large errors between the actual and extracted aquaculture farm information are due to existence of spectral confusion and inadequate spatial resolution of the sensor. This leads to possible occurrence of mixture pixels or 'mixels' of the source of errors in the classification techniques. Understanding the confusing and mixture pixel problems requires the development of efficient methods that can enable more reliable extraction of aquaculture farm information. Thus, the more recently evolved methods such as the step-by-step partial spectral end-member extraction and linear spectral unmixing methods are introduced. The farmer one assumes that an end-member, which is often referred to as 'spectrally pure signature' of a target feature, does not appear to be a spectrally pure form, but always mix with the other features at certain proportions. The assumption of the linear spectral unmxing is that the measured reflectance of a pixel is the linear sum of the reflectance of the mixture components that make up that pixel. The classification accuracy of the step-by-step partial end-member extraction improved significantly compared to that obtained from the traditional supervised classifiers. However, this method did not distinguish the aquaculture ponds and non-aquaculture ponds within the region of the aquaculture farming areas. In contrast, the linear spectral unmixing model produced a set of fraction images for the aquaculture, water and soil. Of these, the aquaculture fraction yields good estimates about the proportion of the aquaculture farm in each pixel. The acquired proportion was compared with the values of NDVI and both are positively correlated (R$^2$ =0.91), indicating the reliability of the sub-pixel classification.ixel classification.

  • PDF

Development and Validation of a Computerized Semi-Quantitative Food Frequency Questionnaire Program for Evaluating the Nutritional Status of the Korean Elderly (한국인 50세 이상 성인과 노인을 위한 반정량 식품섭취빈도 조사지의 개발 및 타당도 검증)

  • 최혜미;이해정;박선주;김정희;김초일;장경자;임경숙;김경원
    • Korean Journal of Community Nutrition
    • /
    • v.7 no.2
    • /
    • pp.277-285
    • /
    • 2002
  • The purpose of this study was to develop a semi-quantitative food frequency questionnaire (SQ-FFQ) for subjects aged 50yr and over and to evaluate the validity of this SQ-FFQ. Dietary intake was assessed using SQ-FFQ that included 98 commonly consumed flood items selected from the results of the Korean Health and Nutritional Survey, 1998. Subjects (n = 2,660) aged 50yr and over were recruited from 7 metropolitan cities and 8 small cities. Each subject was interviewed using this SQ-FFQ developed in our laboratory and 24hr-recall method. Excluding incomplete data, Data from 1,149 subjects were used in this validity study. The nutrient intakes assessed by this SQ-FFQ were validated by comparing with the results from 1 day 24-hour recalls. Pearson's correlation coefficients between two methods were 0.71, 0.64, 0.53, and 0.43 for energy, carbohydrate, protein, and fat, respectively for all subjects. Spearman's correlation coefficients were higher than those of Pearson's correlation coefficients. Kappa values for energy, carbohydrate, protein, and fat were 0.79, 0.72, 0.70, and 0.64, respectively. The percentage for misclassification of the lowest quartile into the highest quartile or vice versa was 1.25-1.39% for all nutrients. Therefore, this SQ-FFQ seems to be useful in assessing the nutritional status of the middle-aged and elderly subjects in Korea.

Combined Application of Data Imbalance Reduction Techniques Using Genetic Algorithm (유전자 알고리즘을 활용한 데이터 불균형 해소 기법의 조합적 활용)

  • Jang, Young-Sik;Kim, Jong-Woo;Hur, Joon
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.3
    • /
    • pp.133-154
    • /
    • 2008
  • The data imbalance problem which can be uncounted in data mining classification problems typically means that there are more or less instances in a class than those in other classes. In order to solve the data imbalance problem, there has been proposed a number of techniques based on re-sampling with replacement, adjusting decision thresholds, and adjusting the cost of the different classes. In this paper, we study the feasibility of the combination usage of the techniques previously proposed to deal with the data imbalance problem, and suggest a combination method using genetic algorithm to find the optimal combination ratio of the techniques. To improve the prediction accuracy of a minority class, we determine the combination ratio based on the F-value of the minority class as the fitness function of genetic algorithm. To compare the performance with those of single techniques and the matrix-style combination of random percentage, we performed experiments using four public datasets which has been generally used to compare the performance of methods for the data imbalance problem. From the results of experiments, we can find the usefulness of the proposed method.

  • PDF

An Effective Shadow Elimination Method Using Adaptive Parameters Update (적응적 매개변수 갱신을 통한 효과적인 그림자 제거 기법)

  • Kim, Byeoung-Su;Lee, Gwang-Gook;Yoon, Ja-Young;Kim, Jae-Jun;Kim, Whoi-Yul
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.3
    • /
    • pp.11-19
    • /
    • 2008
  • Background subtraction, which separates moving objects in video sequences, is an essential technology for object recognition and tracking. However, background subtraction methods are often confused by shadow regions and this misclassification of shadow regions disturbs further processes to perceive the shapes or exact positions of moving objects. This paper proposes a method for shadow elimination which is based on shadow modeling by color information and Bayesian classification framework. Also, because of dynamic update of modeling parametres, the proposed method is able to correspond adaptively to illumination changes. Experimental results proved that the proposed method can eliminate shadow regions effectively even for circumstances with varying lighting condition.

Feasibility Evaluation of High-Tech New Product Development Projects Using Support Vector Machines

  • Shin, Teak-Soo;Noh, Jeon-Pyo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.11a
    • /
    • pp.241-250
    • /
    • 2005
  • New product development (NPD) is defined as the transformation of a market opportunity and a set of assumptions about product technology into a product available for sale. Managers charged with project selection decisions in the NPD process, such as go/no-go choices and specific resource allocation decisions, are faced with a complicated problem. Therefore, the ability to develop new successful products has identifies as a major determinant in sustaining a firm's competitive advantage. The purpose of this study is to develop a new evaluation model for NPD project selection in the high -tech industry using support vector machines (SYM). The evaluation model is developed through two phases. In the first phase, binary (go/no-go) classification prediction model, i.e. SVM for high-tech NPD project selection is developed. In the second phase. using the predicted output value of SVM, feasibility grade is calculated for the final NPD project decision making. In this study, the feasibility grades are also divided as three level grades. We assume that the frequency of NPD project cases is symmetrically determined according to the feasibility grades and misclassification errors are partially minimized by the multiple grades. However, the horizon of grade level can be changed by firms' NPD strategy. Our proposed feasibility grade method is more reasonable in NPD decision problems by considering particularly risk factor of NPD in viewpoints of future NPD success probability. In our empirical study using Korean NPD cases, the SVM significantly outperformed ANN and logistic regression as benchmark models in hit ratio. And the feasibility grades generated from the predicted output value of SVM showed that they can offer a useful guideline for NPD project selection.

  • PDF

Improvement of Land Cover / Land Use Classification by Combination of Optical and Microwave Remote Sensing Data

  • Duong, Nguyen Dinh
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.426-428
    • /
    • 2003
  • Optical and microwave remote sensing data have been widely used in land cover and land use classification. Thanks to the spectral absorption characteristics of ground object in visible and near infrared region, optical data enables to extract different land cover types according to their material composition like water body, vegetation cover or bare land. On the other hand, microwave sensor receives backscatter radiance which contains information on surface roughness, object density and their 3-D structure that are very important complementary information to interpret land use and land cover. Separate use of these data have brought many successful results in practice. However, the accuracy of the land use / land cover established by this methodology still has some problems. One of the way to improve accuracy of the land use / land cover classification is just combination of both optical and microwave data in analysis. In this paper for the research, the author used LANDSAT TM scene 127/45 acquired on October 21, 1992, JERS-1 SAR scene 119/265 acquired on October 27, 1992 and aerial photographs taken on October 21, 1992. The study area has been selected in Hanoi City and surrounding area, Vietnam. This is a flat agricultural area with various land use types as water rice, secondary crops like maize, cassava, vegetables cultivation as cucumber, tomato etc. mixed with human settlement and some manufacture facilities as brick and ceramic factories. The use of only optical or microwave data could result in misclassification among some land use features as settlement and vegetables cultivation using frame stages. By combination of multitemporal JERS-1 SAR and TM data these errors have been eliminated so that accuracy of the final land use / land cover map has been improved. The paper describes a methodology for data combination and presents results achieved by the proposed approach.

  • PDF

Weighted L1-Norm Support Vector Machine for the Classification of Highly Imbalanced Data (불균형 자료의 분류분석을 위한 가중 L1-norm SVM)

  • Kim, Eunkyung;Jhun, Myoungshic;Bang, Sungwan
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.1
    • /
    • pp.9-21
    • /
    • 2015
  • The support vector machine has been successfully applied to various classification areas due to its flexibility and a high level of classification accuracy. However, when analyzing imbalanced data with uneven class sizes, the classification accuracy of SVM may drop significantly in predicting minority class because the SVM classifiers are undesirably biased toward the majority class. The weighted $L_2$-norm SVM was developed for the analysis of imbalanced data; however, it cannot identify irrelevant input variables due to the characteristics of the ridge penalty. Therefore, we propose the weighted $L_1$-norm SVM, which uses lasso penalty to select important input variables and weights to differentiate the misclassification of data points between classes. We demonstrate the satisfactory performance of the proposed method through simulation studies and a real data analysis.