통합 검색 | Korea Science

불균형 자료에 대한 분류분석 (Classification Analysis for Unbalanced Data)

김동아;강수연;송종우
- 응용통계연구
- /
- 제28권3호
- /
- pp.495-509
- /
- 2015
일반적인 2집단 분류(2-class classification)의 경우, 두 집단의 비율이 크게 차이나지 않는 경우가 많다. 본 논문에서는 두 집단의 비율이 크게 차이나는 불균형 데이터(unbalanced data)의 분류 문제에 대해서 다루고자 한다. 불균형 데이터의 분류방법은 균형이 맞는 데이터(balanced data)의 경우보다 분류하기 어려운 경우가 많다. 이런 자료에서 보통의 분류모형을 적용하게 되면 많은 경우에 대부분의 관측치가 큰 집단으로 분류 되는 경우가 많은데 실질적인 어플리케이션에서는 이런 오분류가 손해가 더 큰 경우가 대부분이다. 우리는 sampling 기법을 이용하여 다양한 분류 방법론의 성능을 비교 분석 하였다. 또한 비대칭 손실(asymmetric loss)을 가정한 경우에 어떤 방법론이 가장 작은 loss를 생성하는 지를 비교하였다. 성능 비교를 위해서는 오분류율(misclassification rate), G-mean, ROC, 그리고 AUC(Area under the curve) 등을 이용하였다.
https://doi.org/10.5351/KJAS.2015.28.3.495 인용 PDF KSCI

SOM기반 특징 신호 추출 기법을 이용한 불균형 주기 신호의 이상 탐지 (Fault Detection of Unbalanced Cycle Signal Data Using SOM-based Feature Signal Extraction Method)

김송이;강지훈;박종혁;김성식;백준걸
- 한국시뮬레이션학회논문지
- /
- 제21권2호
- /
- pp.79-90
- /
- 2012
본 연구는 공정신호가 불균형 데이터인 경우 이상 탐지 알고리즘의 성능 개선을 위한 특징 신호 추출 기법을 제안한다. 불균형 데이터란 범주 구분 문제에서 하나의 범주의 속하는 데이터의 비율이 다른 범주의 데이터에 비해 크게 차이나 이상 탐지성능이 크게 저하되는 경우를 의미한다. 공정이 운영되는 경우 얻을 수 있는 이상 신호의 수는 정상 신호에 비해 매우 적기에 이러한 문제를 해결하여 이상 탐지 기법을 적용하는 것은 매우 중요하다. 불균형 문제 해결을 위해 SOM(Self-Organizing Map) 알고리즘을 이용하여 각 노드에 대응되는 가중치를 특징 신호로 간주하여 정상 데이터와 이상 데이터의 비율을 맞춘다. 특징 신호 데이터 집단의 이상 탐지를 위해 클래스 분류 기법인 kNN(k-Nearest Neighbor)과 SVM(Support Vector Machine)을 적용하여 이를 공정 신호 이상탐지를 위해 주로 사용하는 Hotelling's $T^2$ 관리도와 성능을 비교한다. 반도체 공정에서 발생한다고 알려진 공정 신호를 모사하여 신호 알고리즘 성능의 우수성을 검증한다.
https://doi.org/10.9709/JKSS.2012.21.2.079 인용 PDF KSCI

초전도 전력케이블의 전류 불평형에 관한 연구 (A Study on the Unbalanced Current Distribution of HTS Power Cable)

김재호;박충화
- 한국안전학회지
- /
- 제27권6호
- /
- pp.43-47
- /
- 2012
The unbalance currents flow the High Temperature Superconducting (HTS) power cable caused by asymmetrical fault, harmonic distortion and unbalanced load. That problem causes additional loss and leakage field in the HTS power cable, and deteriorates the electric power quality and stability. In addition, large amounts of unbalanced current can cause negative sequence and ground relays to operate. This paper presents an analysis unbalanced three-phase current distribution in HTS power cable caused by unbalanced load condition and grounding methods using PSCAD/EMTDC. The results obtained through the analysis would provide important data for the design of HTS power cables and valid information for their installation in power system.
https://doi.org/10.14346/JKOSOS.2012.27.6.043 인용 PDF KSCI

불균형 이분 데이터 분류분석을 위한 데이터마이닝 절차 (A Data Mining Procedure for Unbalanced Binary Classification)

정한나;이정화;전치혁
- 대한산업공학회지
- /
- 제36권1호
- /
- pp.13-21
- /
- 2010
The prediction of contract cancellation of customers is essential in insurance companies but it is a difficult problem because the customer database is large and the target or cancelled customers are a small proportion of the database. This paper proposes a new data mining approach to the binary classification by handling a large-scale unbalanced data. Over-sampling, clustering, regularized logistic regression and boosting are also incorporated in the proposed approach. The proposed approach was applied to a real data set in the area of insurance and the results were compared with some other classification techniques.
PDF KSCI

불균형자료를 위한 판별분석에서 HDBSCAN의 활용 (Discriminant analysis for unbalanced data using HDBSCAN)

이보희;김태헌;최용석
- 응용통계연구
- /
- 제34권4호
- /
- pp.599-609
- /
- 2021
군집간의 개체 수의 차이가 큰 자료들을 불균형자료라고 한다. 불균형자료의 판별분석에서 다수 범주의 개체를 잘 분류하는 것 보다 소수 범주의 개체를 잘 분류하는 것이 더 중요하다. 그러나 개체 수가 상대적으로 작은 소수 범주의 개체를 개체 수가 상대적으로 많은 다수 범주의 개체로 오분류하는 경우가 많다. 본 연구에서는 이를 해결하기 위해 HDBSCAN과 SMOTE를 결합한 방법을 제안한다. HDBSCAN을 이용하여 소수 범주의 노이즈와 다수 범주의 노이즈를 제거하고 SMOTE를 적용하여 새로운 자료를 만들어낸다. 기존의 방법들과 성능을 비교하기 위하여 AUC와 F1 점수를 이용하였고 그 결과 대부분의 경우에 HDBSCAN과 SMOTE를 결합한 방법이 높은 성능 지표를 보였고, 불균형자료를 분류하는데 있어 뛰어난 방법으로 나타났다.
https://doi.org/10.5351/KJAS.2021.34.4.599 인용 PDF KSCI

유아의 나이에 따른 편식 및 식습관 실태 (Dietary Habit and Unbalanced Diet Status of Young Children by Age)

정유미
- 한국식생활문화학회지
- /
- 제34권5호
- /
- pp.587-594
- /
- 2019
This study investigated the general information, unbalanced diet, and dietary habits of 86 children in Daegu. The research was undertaken to analyze the current state of diet and dietary habits of children, and to provide basic data for nutrition education. The results reveal that younger children have a more unbalanced diet. Children dislike side-dishes the most. Furthermore, due to the longer time taken to consume food, parents persuade children to eat quickly. Children were also determined to have a high intake of foods and drinks containing sugar; beverages containing sugar are consumed 1-2 times a week by 5-year-olds, and once daily by 6- and 7-year-olds. The results of this study can be applied to provide basic data for nutritional education, and assist in the development of dietary programs for young children.
https://doi.org/10.7318/KJFC/2019.34.5.587 인용 PDF KSCI

Noninformative Priors for Fieller-Creasy Problem using Unbalanced Data

Kim, Dal-Ho;Lee, Woo-Dong;Kang, Sang-Gil
- 한국데이터정보과학회:학술대회논문집
- /
- 한국데이터정보과학회 2005년도 추계학술대회
- /
- pp.71-84
- /
- 2005
The Fieller-Creasy problem involves statistical inference about the ratio of two independent normal means. It is difficult problem from either a frequentist or a likelihood perspective. As an alternatives, a Bayesian analysis with noninformative priors may provide a solution to this problem. In this paper, we extend the results of Yin and Ghosh (2001) to unbalanced sample case. We find various noninformative priors such as first and second order matching priors, reference and Jeffreys' priors. The posterior propriety under the proposed noninformative priors will be given. Using real data, we provide illustrative examples. Through simulation study, we compute the frequentist coverage probabilities for probability matching and reference priors. Some simulation results will be given.
PDF

Integrated Partial Sufficient Dimension Reduction with Heavily Unbalanced Categorical Predictors

Yoo, Jae-Keun
- 응용통계연구
- /
- 제23권5호
- /
- pp.977-985
- /
- 2010
In this paper, we propose an approach to conduct partial sufficient dimension reduction with heavily unbalanced categorical predictors. For this, we consider integrated categorical predictors and investigate certain conditions that the integrated categorical predictor is fully informative to partial sufficient dimension reduction. For illustration, the proposed approach is implemented on optimal partial sliced inverse regression in simulation and data analysis.
https://doi.org/10.5351/KJAS.2010.23.5.977 인용 PDF KSCI

RDP: A storage-tier-aware Robust Data Placement strategy for Hadoop in a Cloud-based Heterogeneous Environment

Muhammad Faseeh Qureshi, Nawab;Shin, Dong Ryeol
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제10권9호
- /
- pp.4063-4086
- /
- 2016
Cloud computing is a robust technology, which facilitate to resolve many parallel distributed computing issues in the modern Big Data environment. Hadoop is an ecosystem, which process large data-sets in distributed computing environment. The HDFS is a filesystem of Hadoop, which process data blocks to the cluster nodes. The data block placement has become a bottleneck to overall performance in a Hadoop cluster. The current placement policy assumes that, all Datanodes have equal computing capacity to process data blocks. This computing capacity includes availability of same storage media and same processing performances of a node. As a result, Hadoop cluster performance gets effected with unbalanced workloads, inefficient storage-tier, network traffic congestion and HDFS integrity issues. This paper proposes a storage-tier-aware Robust Data Placement (RDP) scheme, which systematically resolves unbalanced workloads, reduces network congestion to an optimal state, utilizes storage-tier in a useful manner and minimizes the HDFS integrity issues. The experimental results show that the proposed approach reduced unbalanced workload issue to 72%. Moreover, the presented approach resolve storage-tier compatibility problem to 81% by predicting storage for block jobs and improved overall data block placement by 78% through pre-calculated computing capacity allocations and execution of map files over respective Namenode and Datanodes.
https://doi.org/10.3837/tiis.2016.09.003 인용 PDF KSCI KPUBS HTML

Empirical Statistical Power for Testing Multilocus Genotypic Effects under Unbalanced Designs Using a Gibbs Sampler

Lee, Chae-Young
- Asian-Australasian Journal of Animal Sciences
- /
- 제25권11호
- /
- pp.1511-1514
- /
- 2012
Epistasis that may explain a large portion of the phenotypic variation for complex economic traits of animals has been ignored in many genetic association studies. A Baysian method was introduced to draw inferences about multilocus genotypic effects based on their marginal posterior distributions by a Gibbs sampler. A simulation study was conducted to provide statistical powers under various unbalanced designs by using this method. Data were simulated by combined designs of number of loci, within genotype variance, and sample size in unbalanced designs with or without null combined genotype cells. Mean empirical statistical power was estimated for testing posterior mean estimate of combined genotype effect. A practical example for obtaining empirical statistical power estimates with a given sample size was provided under unbalanced designs. The empirical statistical powers would be useful for determining an optimal design when interactive associations of multiple loci with complex phenotypes were examined.
https://doi.org/10.5713/ajas.2012.12133 인용 PDF KSCI

검색결과 324건 처리시간 0.028초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)