• Title/Summary/Keyword: Skewed class distribution

Search Result 8, Processing Time 0.023 seconds

Bayesian Hierarchical Model with Skewed Elliptical Distribution

  • Chung Younshik
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2000.11a
    • /
    • pp.5-12
    • /
    • 2000
  • Meta-analysis refers to quantitative methods for combining results from independent studies in order to draw overall conclusions. We consider hierarchical models including selection models under a skewed heavy tailed error distribution and it is shown to be useful in such Bayesian meta-analysis. A general class of skewed elliptical distribution is reviewed and developed. These rich class of models combine the information of independent studies, allowing investigation of variability both between and within studies, and weight function. Here we investigate sensitivity of results to unobserved studies by considering a hierarchical selection model and use Markov chain Monte Carlo methods to develop inference for the parameters of interest.

  • PDF

A Study on the development of Criterion Scores for the Attachment Q-set in Korea (애착 Q-set의 국내 준거 개발 연구)

  • Lee, Young;Park, Kyung Ja;Rah, Yu Mee
    • Korean Journal of Child Studies
    • /
    • v.18 no.2
    • /
    • pp.131-148
    • /
    • 1997
  • The purpose of this study was to develop criterion scores for the Korean version of the Attachment Q-set. It further examined distribution of security of attachment scores of Korean infants and differences in attachment scores by the cultural background. The criterion scores of attachment security were developed by 8 judges who are knowledgable in attachment theory and research. They used the Q-set to describe behavior characteristics of ideally secure infants of 12 and 36 months of age. Distribution of the attachment scores was analyzed with 191 infants, compiled from 4 studies including infants selected for this study. The attachment security criterion scores developed for Korean infants correlated highly with the Waters' criterion scores (1987) for American infants, .90 for 12 months and .88 for 36 months of age. Correlations between attachment scores developed for 12-and 36-month-olds was .89. The attachment security scores of the Korean version was a little higher and more negatively skewed than scores calculated using the American criterion. There were significant differences in the security of attachment scores by socioeconomic background of the infants, but not with employment status of the mothers. Infants of nonemployed middle class mothers had significantly higher security of attachment scores than infants of nonemployed lower class mothers. Infants from lower class families had higher "difficulty" scores, and "enjoying physical contact" scores were higher among infants from the middle class.

  • PDF

Topic Classification for Suicidology

  • Read, Jonathon;Velldal, Erik;Ovrelid, Lilja
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.2
    • /
    • pp.143-150
    • /
    • 2012
  • Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vector machine classifiers, using cost-sensitive learning to deal with class imbalance. The features investigated range from a simple bag-of-words and n-grams over stems, to information drawn from syntactic dependency analysis and WordNet synonym sets. The experimental results are complemented by an analysis of systematic errors in both the output of our system and the gold-standard annotations.

Optimization of Classifier Performance at Local Operating Range: A Case Study in Fraud Detection

  • Park Lae-Jeong;Moon Jung-Ho
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.5 no.3
    • /
    • pp.263-267
    • /
    • 2005
  • Building classifiers for financial real-world classification problems is often plagued by severely overlapping and highly skewed class distribution. New performance measures such as receiver operating characteristic (ROC) curve and area under ROC curve (AUC) have been recently introduced in evaluating and building classifiers for those kind of problems. They are, however, in-effective to evaluation of classifier's discrimination performance in a particular class of the classification problems that interests lie in only a local operating range of the classifier, In this paper, a new method is proposed that enables us to directly improve classifier's discrimination performance at a desired local operating range by defining and optimizing a partial area under ROC curve or domain-specific curve, which is difficult to achieve with conventional classification accuracy based learning methods. The effectiveness of the proposed approach is demonstrated in terms of fraud detection capability in a real-world fraud detection problem compared with the MSE-based approach.

A Comparison of Ensemble Methods Combining Resampling Techniques for Class Imbalanced Data (데이터 전처리와 앙상블 기법을 통한 불균형 데이터의 분류모형 비교 연구)

  • Leea, Hee-Jae;Lee, Sungim
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.3
    • /
    • pp.357-371
    • /
    • 2014
  • There are many studies related to imbalanced data in which the class distribution is highly skewed. To address the problem of imbalanced data, previous studies deal with resampling techniques which correct the skewness of the class distribution in each sampled subset by using under-sampling, over-sampling or hybrid-sampling such as SMOTE. Ensemble methods have also alleviated the problem of class imbalanced data. In this paper, we compare around a dozen algorithms that combine the ensemble methods and resampling techniques based on simulated data sets generated by the Backbone model, which can handle the imbalance rate. The results on various real imbalanced data sets are also presented to compare the effectiveness of algorithms. As a result, we highly recommend the resampling technique combining ensemble methods for imbalanced data in which the proportion of the minority class is less than 10%. We also find that each ensemble method has a well-matched sampling technique. The algorithms which combine bagging or random forest ensembles with random undersampling tend to perform well; however, the boosting ensemble appears to perform better with over-sampling. All ensemble methods combined with SMOTE outperform in most situations.

Cost-sensitive Learning for Credit Card Fraud Detection (신용카드 사기 검출을 위한 비용 기반 학습에 관한 연구)

  • Park Lae-Jeong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.5
    • /
    • pp.545-551
    • /
    • 2005
  • The main objective of fraud detection is to minimize costs or losses that are incurred due to fraudulent transactions. Because of the problem's nature such as highly skewed, overlapping class distribution and non-uniform misclassification costs, it is, however, practically difficult to generate a classifier that is near-optimal in terms of classification costs at a desired operating range of rejection rates. This paper defines a performance measure that reflects classifier's costs at a specific operating range and offers a cost-sensitive learning approach that enables us to train classifiers suitable for real-world credit card fraud detection by directly optimizing the performance measure with evolutionary programming. The experimental results demonstrate that the proposed approach provides an effective way of training cost-sensitive classifiers for successful fraud detection, compared to other training methods.

Estimation of Optimal Harvest Volume for the Long-term Forest Management Planning using Goal Programming (장기산림경영계획의 목표수확량 산출을 위한 목표계획법의 적용)

  • Won, Hyun-Kyu;Kim, Young-Hwan;Kwon, Soon-Duk
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.1
    • /
    • pp.125-131
    • /
    • 2009
  • To facilitate the sustainable forest management, Forest Service in Korea has assigned 2.9 million hectare forests as 'intensive management forests' and encouraged local governments to develop a strategic management plan for their forests. One of problems for the sustainable forest management in Korea is the skewed distribution of forest age classes. Currently the majority of forestlands in Korea is occupied by age classes III and IV. In this study, we intended to find an optimum harvest volume, which enable one to make the intensive management forest in Youngdong-Gun evenly distributed for the age classes and allow an even harvest volume through a 50 year time horizon. To develop an optimization model, we applied the goal programming technique which is adequate for a multi-purpose management planning. The results indicated that it is necessary to harvest 1.2 million cubic meters in each decade to achieve the most stable distribution of age classes for the study site. The harvest volume target resulted from this study would be used in a management planning or an associated policy making process in the future.

Bike Insurance Fraud Detection Model Using Balanced Randomforest Algorithm (균형 랜덤 포레스트를 이용한 이륜차 보험사기 적발 모형 개발)

  • Kim, Seunghoon;Lee, Soo Il;Kim, Tae ho
    • Journal of Digital Convergence
    • /
    • v.20 no.2
    • /
    • pp.241-250
    • /
    • 2022
  • Due to the COVID-19 pandemic, with increased 'untact' services and with unstable household economy, the bike insurance fraud is expected to surge. Moreover, the fraud methodology gets complicated. However, the fraud detection model for bike insurance is absent. we deal with the issue of skewed class distribution and reflect the criterion of fraud detection expert. We utilize a balanced random-forest algorithm to develop an efficient bike insurance fraud detection model. As a result, while the predictive performance of balanced random-forest model is superior than it of non-balanced model. There is no significant difference between the variables used by the experts and the confirmatory models. The important variables to detect frauds are turned out to be age and gender of driver, correspondence between insured and driver, the amount of self-repairing claim, and the amount of bodily injury liability.