• 제목/요약/키워드: Under-Sampling

검색결과 1,087건 처리시간 0.025초

데이터 불균형 해결을 위한 Under-Sampling 기반 앙상블 SVMs (EUS SVMs: Ensemble of Under-Sampled SVMs for Data Imbalance Problems)

  • 강필성;조성준
    • 한국경영과학회:학술대회논문집
    • /
    • 대한산업공학회/한국경영과학회 2006년도 춘계공동학술대회 논문집
    • /
    • pp.291-298
    • /
    • 2006
  • 패턴인식 문제에서 한 범주에 속한 데이터의 수가 다른 범주에 속한 데이터의 수보다 극히 많거나 적으면 데이터 불균형이 발생했다고 한다. Support Vector Machine(SVM)은 다른 기계 학습 알고리즘들과 마찬가지로 학습에 사용되는 데이터의 범주간 비율이 거의 비슷하다는 가정 하에서 학습을 하고 예측 결과를 도출하게 된다. 그러나 실제 문제에서는 데이터의 불균형이 발생하는 경우가 매우 빈번하며, 이러한 경우에는 모델의 성능이 매우 저하되는 문제점이 발생한다. 본 논문에서는 실제로 데이터 불균형이 SVM의 분류 결과에 어떠한 영향을 미치는지를 2차원 인공 데이터를 통하여 알아본다. 그리고 이러한 데이터 불균형을 해소하기 위하여 Under-Sampling 기반 앙상블 SVM을 제안하였다. 제안된 방법을 두 가지 인공 데이터에 적용하여 본 결과, 제안된 방법은 데이터 불균형을 해소하기 위해 사용되는 기존의 방법들에 비하여 소수 범주에 속하는 데이터의 수가 매우 적고 데이터의 불균형이 매우 심한 경우에도 높은 성능과 안정성을 갖는 효과적인 방법이라는 것이 입증되었다.

  • PDF

이단계 집락추출에서의 표본크기에 대한 연구 (A Study of Sample Size for Two-Stage Cluster Sampling)

  • 송종호;제해성;박민규
    • 응용통계연구
    • /
    • 제24권2호
    • /
    • pp.393-400
    • /
    • 2011
  • 조사비용과 시간과 같은 현실적인 제약하에서 관측단위 (observation unit)의 집합인 집락(cluster)율 추출하는 집락추출법은 대부분의 대형조사(large scale survey) 에서 흔히 사용된다. 특별히 집락내의 관측단위가 매우 유사한 경우, 집락 내의 모든 관측치를 조사하는 대신 일부를 추출하여 조사하는 이단계 집락 추출법이 선호된다. 이단계 집락추출법의 적용시 집락인 1차추출단위 (Primary Sampling Unit; PSU)와 관측단위인 2차추출단위(Secondary Sampling Unit; SSU)의 표본수 결정은 주어진 비용과 표본으로부터 계산되어지는 통계량의 정도에 의존한다. 본 연구에서는 기존의 1차추출단위의 크기가 동일하다는 가정하에서 유도된 최적 PSU와 SSU 표본크기 산출과정을 일반화하여 1차추출단위의 크기가 같지 않을 경우의 최적 표본크기를 유도하고 그 결과를 제 4차 퇴원환자조사를 위한 표본추출 방안에 적용하여 기존방법과 비교하였으며 이를 바탕으로 제 7차 퇴원환자조사를 위한 표본크기를 제안하였다.

여과지가 장착된 3단 카세트를 이용한 입자상물질 채취용 펌프의 유량성능 평가방법 (Development of an Evaluation Method for Flow Rate Performance of Particulate Sampling Pump using Three-pieces Cassette Holder Containing Filters)

  • 송호준;김남희;김기연;마혜란;이광용;정지연
    • 한국산업보건학회지
    • /
    • 제23권4호
    • /
    • pp.348-355
    • /
    • 2013
  • Objectives: In working environment measurement, sampling is an important stage for obtaining reliable result as analysis. A personal air sampling pump is one of the most fundamental and important element in the work environment measurement, but it remains at the level of calibrating the flow rate of the pump before and after sampling. There is no checking whether the flow rate set at the initial stage would be hold during sampling. The purpose of this study was to develop a method to evaluate the flow rate performance of particulate sampling pump with three-pieces cassette holder containing filters commonly used to sample particulate. Materials and methods: We tested back pressure of particulate sampling pumps commonly used in Korea with three-pieces cassette holder containing various filters, and tried to find out the combination conditions of filters in accordance with back pressure required by ISO standard 13137. Results: We found out the matrix of sampling media such as three-pieces cassette holder containing filters applicable to the pressure drop required by the ISO standard for evaluating the flow rate stability under increasing pressure drop and long term(8 hour) performance. Conclusions: This evaluation method using sampling media matrix for checking flow rate stability proposed by this study could be very useful tool to find out good performance pumps before sampling.

임분재적(林分材積) 추정(推定)에 관(關)한 연구(硏究) (Study on the Estimate of Stand Volume in the Pitch Pine Forest)

  • 이여하
    • 한국산림과학회지
    • /
    • 제18권1호
    • /
    • pp.1-7
    • /
    • 1973
  • 본시험(本試驗)은 리기다소나무 인공단순림(인공단순림)에 있어서 임분재적추정(林分材積推定)을 단급법(單級法) 단순임의추출법(單純任意抽出法) 복합비추정법(複合比推定法) 분리비추정법(分離比推定法) 표준목비추정법(標準木比推定法)등의 방법(方法)으로 조사비교(調査比較)한 것이다. 비추정(比推定)에 의하여 얻은 결과(結果)는 표(表) 8과 같아 실제임분재적(實際林分材積)이 추정임분재적내(推定林分材積內)에 포함(包含)되어 있는 것은 단순임의추출법(單純任意抽出法)과 복합비추정법(複合比推定法)뿐이다. 단순임의추출법(單純任意抽出法)이 좋은 결과(結果)를 갖어온 것은 임분(林分)의 구성상태(構成狀態)가 단순동령림(單純同齡林)이기 때문이라 추측(推測)된다. 추정시(推定時) 측정(測定)이나 계산(計算)이 간편한것은 (1) 단급법(單級法) (2) 단순임의추출법(單純任意抽出法) (3) 표준목비추정법(標準木比推定法) (4) 분리비추정법(分離比推定法) (5) 복합비추정법(複合比推定法)의 순(順)으로 되어 정도면(精度面)이나 실제임분(實際林分) 재적(材積) 측정시(測定時) 시간경비(時間經費) 노력(勞力)을 비교적(比較的) 적게 들이고 측정(測定)할수 있는 방법(方法)은 적어도 인공일제림(人工一齊林)에서는 단순임의추출법(單純任意抽出法)이 가장 좋은 결과(結果)를 가져왔다고 본다.

  • PDF

Estimation of P(X > Y) when X and Y are dependent random variables using different bivariate sampling schemes

  • Samawi, Hani M.;Helu, Amal;Rochani, Haresh D.;Yin, Jingjing;Linder, Daniel
    • Communications for Statistical Applications and Methods
    • /
    • 제23권5호
    • /
    • pp.385-397
    • /
    • 2016
  • The stress-strength models have been intensively investigated in the literature in regards of estimating the reliability ${\theta}$ = P(X > Y) using parametric and nonparametric approaches under different sampling schemes when X and Y are independent random variables. In this paper, we consider the problem of estimating ${\theta}$ when (X, Y) are dependent random variables with a bivariate underlying distribution. The empirical and kernel estimates of ${\theta}$ = P(X > Y), based on bivariate ranked set sampling (BVRSS) are considered, when (X, Y) are paired dependent continuous random variables. The estimators obtained are compared to their counterpart, bivariate simple random sampling (BVSRS), via the bias and mean square error (MSE). We demonstrate that the suggested estimators based on BVRSS are more efficient than those based on BVSRS. A simulation study is conducted to gain insight into the performance of the proposed estimators. A real data example is provided to illustrate the process.

Development of New Optimized Sampling method for 3D Shape Recovery in the Presence of Noise

  • Lee, Hyeong-Geun;Jang, Hoon-Seok
    • 한국정보전자통신기술학회논문지
    • /
    • 제13권2호
    • /
    • pp.113-122
    • /
    • 2020
  • Noise affects the accuracy of three-dimensional shape recovery. Its occurrence is unpredictable and depends on several mechanical, environmental, and other factors. When two-dimensional image sequences are obtained for shape from focus (SFF), mechanical vibration occurs in the translational stage, causing an error in the three-dimensional shape recovery. To address this issue, mechanical vibration is modeled using Newton's second law and the principle of the rack and pinion gear. Then, an optimal sampling step size considering the mechanical vibration is suggested through theoretical demonstration. Experiments conducted with real objects verify the effectiveness of the proposed sampling step size. In this paper, in a realistic environment with noise, the potential of obtaining more accurate three-dimensional reconstruction results of the objects is explored by acquiring the optimal sampling step size, which improves the sampling step size relative to those reported in a previous study performed under similar conditions.

평균 샘플 수 최소화를 통한 계량형 반복 샘플링 검사의 설계 (A Variables Repetitive Group Sampling Plan for Minimizing Average Sample Number)

  • 박희곤;문영건;전치혁;;이재욱
    • 대한산업공학회지
    • /
    • 제30권3호
    • /
    • pp.205-212
    • /
    • 2004
  • This paper proposes the variables repetitive group sampling plan where the quality characteristic following normal distribution has upper or lower specification limit. The problem is formulated as a non-linear programming problem where the objective function to minimize is the average sample number and the constraints are related to lot acceptance probabilities at acceptable quality level (AQL) and limiting quality level (LQL) under the operating characteristic curve. Sampling plan tables are constructed for the selection of parameters indexed by AQL and LQL in the cases of known standard deviation and unknown standard deviation. It is shown that the proposed sampling plan significantly reduces the average sample number as compared with the single or the double sampling plan.

임의번호걸기와 시간균형할당표집에 의한 전화조사의 주요결과 (The Major Findings of the Telephone Survey by Random Digit Dialing and Time-Balanced Quota Sampling)

  • 허명회;한상태;김지연;성은하;강현철
    • 한국조사연구학회지:조사연구
    • /
    • 제12권2호
    • /
    • pp.77-88
    • /
    • 2011
  • 최근 우리나라의 전화조사가 전화번호부 기반에서 벗어나 임의번호걸기(RDD)로 이행되는 추세에 있다. 그러나 아직도 대부분의 전화조사가 지역 성 나이대 할당표집으로 수행되므로 이로 인해 재택성향이 큰 사회적 계층에 편중되기 쉽다는 지적이 있어 왔다. 이에 대한 대응으로서 시간균형할당표집이 제안된 바 있었으나 실증적 조사는 이제까지 부재하였다(허명회 황진모 2006). 이 연구는 TV시청환경에 대한 조사를 RDD와 통상적인 할당 표집의 결합방식으로 수행한 결과와 RDD와 시간균형할당표집의 결합방식으로 수행한 결과를 비교하여 제시한다.

  • PDF

공기중 염화비닐단량체의 포집시 공기 포집량이 파과에 미치는 영향 (Effect of sampling volume on the breakthrough of charcoal tube during vinyl chloride monomer sampling)

  • 윤존중;임남구;김치년;노재훈
    • 한국산업보건학회지
    • /
    • 제11권3호
    • /
    • pp.241-248
    • /
    • 2001
  • The main factors of breakthrough are known to sampling time, flow rate, concentration of the sample, temperature, humidity, and the physical characteristics of the solid sorbent tube. However, no study has been reported the effect of temperature and sampling volume on the breakthrough of acharcoal tube during vinyl chloride monomer (VCM) sampling. The objective of this study is to suggest the optimal sampling condition during VCM sampling based on National Institute for Occupational Safety and Health (NIOSH) method. To evaluate adequate sampling volume for VCM without breakthrough, volume of 1, 2, 3, 4, and 5 L each from VCM of 1, 5, 10, 15, and 20ppm at flow rate of 0.05 L/min were sampled in $22^{\circ}C$ and $40^{\circ}C$. At $22^{\circ}C$, in the case of 1, 5, 10, and 15ppm, VCM was adsorbed completely in first section of charcoal tube regardless of sampling volume. But in 20ppm, detection rates are 99.56% in first section and 0.44% in second section. At $40^{\circ}C$ of 1ppm, VCM was adsorbed completely in first section. In 10, 15, and 20ppm, detection rates of second, third, and forth sections were decreased significantly by reduction of sampling volume. In determination of breakthrough based on NIOSH method, no breakthrough was occurred in 20ppm at $22^{\circ}C$. At $40^{\circ}C$, breakthrough was occurred in 10, 15, and 20ppm when sampling volume was 5L. Although no breakthrough was occurred when sampling volume was 3L. Finally, in environment of temperature around $22^{\circ}C$, breakthrough may not occurred up to 20ppm during sampling for VCM. During sampling for VCM in environment of temperature around $40^{\circ}C$, no breakthrough occurred in 1-5ppm and 10-20ppm when sampling volume is 5L and 3L respectively. This result suggests that the sampling volume should be considered when VCM sampling under hot conditions (> $22^{\circ}C$) by the NIOSH method No. 1007.

  • PDF

대기 중 오염물질의 시료채취시 관측오차 저감방법에 대한 연구 : 6구형 매니폴더를 장착한 MFC system의 개발과 평가 (Methodological Approaches to Reduce Uncertainties Associated with Air Sampling : Development and Assessment of a Six-port Manifold MFC System)

  • 김기현;오상인;최여진;김민영;최규훈
    • 한국대기환경학회지
    • /
    • 제19권4호
    • /
    • pp.377-386
    • /
    • 2003
  • In order to develop a confident sampling technique, we designed and constructed a 6-port manifold MFC sampling system for collecting gaseous pollutants in air. Using this instrumentation, we tested the performance criteria of MFC system in terms of: (1) flow rate; (2) MFC-to-MFC variability; (3) tube-to-tube variability; and (4) time. It was interesting to find that the later two factors did not show any significant variations, while the former two show substantially large variations. However, as most of those variabilities are consistent enough to form systematic patterns, we were able to explain the occurrence patterns of all those MFC biases in terms of those four major variables. The overall results of our experiment suggest that one needs to use correction factor for each MFC unit under a given flow rate to maintain optimal accuracy and precision for sampling of those pollutants.