Search | Korea Science

Outlier detection and treatment in industrial sampling survey (경제조사에서의 이상치 탐지와 처리방법)

Joo, Young Sun;Cho, Gyo-Young
- Journal of the Korean Data and Information Science Society
- /
- v.27 no.1
- /
- pp.131-142
- /
- 2016
Outliers in surveys can have a large effect on estimates of totals. This is especially true in business surveys where the populations are drawn are typically skewed. In this paper, we discussed the practical development and implementation of methods to identify and deal with outliers. A detection method is based on quartile method and detected outlier is processed in various ways. The study examines two versions of winsorised estimators with three different cut-off thresholds for each one. For the simulation study, four types of weight transformation function have been considered.
https://doi.org/10.7465/jkdi.2016.27.1.131 인용 PDF KSCI

A Test for Weibull Distribution and Extreme Value Distribution Based on Kullback-Leibler Information (쿨백-레이블러 정보함수에 기초한 와이블분포와 극단값 분포에 대한 적합도 검정)

김종태;이우동
- The Korean Journal of Applied Statistics
- /
- v.11 no.2
- /
- pp.351-362
- /
- 1998
In this paper, a test of fit for Weibull distribution on the estimated Kullback-Leibler information is proposed. The test uses the Vasicek entropy estimates, so to compute it a window size m must first be fried, and then is obtained critical values computed by Monte Carlo simulations. The power of the proposed test under various alternatives is compares with that of ocher famous tests. The use of the test is shown in an illustrative example.
PDF

Estimation of Car Insurance Loss Ratio Using the Peaks over Threshold Method (POT방법론을 이용한 자동차보험 손해율 추정)

Kim, S.Y.;Song, J.
- The Korean Journal of Applied Statistics
- /
- v.25 no.1
- /
- pp.101-114
- /
- 2012
In car insurance, the loss ratio is the ratio of total losses paid out in claims divided by the total earned premiums. In order to minimize the loss to the insurance company, estimating extreme quantiles of loss ratio distribution is necessary because the loss ratio has essential prot and loss information. Like other types of insurance related datasets, the distribution of the loss ratio has heavy-tailed distribution. The Peaks over Threshold(POT) and the Hill estimator are commonly used to estimate extreme quantiles for heavy-tailed distribution. This article compares and analyzes the performances of various kinds of parameter estimating methods by using a simulation and the real loss ratio of car insurance data. In addition, we estimate extreme quantiles using the Hill estimator. As a result, the simulation and the loss ratio data applications demonstrate that the POT method estimates quantiles more accurately than the Hill estimation method in most cases. Moreover, MLE, Zhang, NLS-2 methods show the best performances among the methods of the GPD parameters estimation.
https://doi.org/10.5351/KJAS.2012.25.1.101 인용 PDF KSCI

The fundamental frequency (f0) distribution of Korean speakers in a dialogue corpus using Praat and R (Praat과 R로 분석한 한국인 대화 음성 말뭉치의 fundamental frequency(f0)값 분포)

Byunggon Yang
- Phonetics and Speech Sciences
- /
- v.15 no.3
- /
- pp.17-25
- /
- 2023
This study examines the fundamental frequency(f0) distribution of 2,740 Korean speakers in a dialogue speech corpus. Praat and R were used for the collection and analysis of acoustical f0 data after removing extreme values considering the interquartile f0 range of the intonational phrases produced by each individual speaker. Results showed that the average f0 value of all speakers was 185 Hz and the median value was 187 Hz. The f0 data showed a positively skewed distribution of 0.11, and the kurtosis was -0.09, which is close to the normal distribution. The pitch values of daily conversations varied in the range of 238 Hz. Further examination of the male and female groups showed distinct median f0 values: 114 Hz for males and 199 Hz for females. A t-test between the two groups yielded a significant difference. The skewness representing the distribution shape was 1.24 for the male group and 0.58 for the female group. The kurtosis was 5.21 and 3.88 for the male and female groups, and the male group values appeared leptokurtic. A regression analysis between the median f0 and age yielded a slope of 0.15 for the male group and -0.586 for the female group, which indicated a divergent relationship. In conclusion, a normative f0 distribution of different Korean age and sex groups can be examined in the conversational speech corpus recorded by a massive number of participants. However, more rigorous data might be required to define a relation between age and f0 values.
https://doi.org/10.13064/KSSS.2023.15.3.017 인용 PDF

Statistical Analysis of Extreme Values of Financial Ratios (재무비율의 극단치에 대한 통계적 분석)

Joo, Jihwan
- Knowledge Management Research
- /
- v.22 no.2
- /
- pp.247-268
- /
- 2021
Investors mainly use PER and PBR among financial ratios for valuation and investment decision-making. I conduct an analysis of two basic financial ratios from a statistical perspective. Financial ratios contain key accounting numbers which reflect firm fundamentals and are useful for valuation or risk analysis such as enterprise credit evaluation and default prediction. The distribution of financial data tends to be extremely heavy-tailed, and PER and PBR show exceedingly high level of kurtosis and their extreme cases often contain significant information on financial risk. In this respect, Extreme Value Theory is required to fit its right tail more precisely. I introduce not only GPD but exGPD. GPD is conventionally preferred model in Extreme Value Theory and exGPD is log-transformed distribution of GPD. exGPD has recently proposed as an alternative of GPD(Lee and Kim, 2019). First, I conduct a simulation for comparing performances of the two distributions using the goodness of fit measures and the estimation of 90-99% percentiles. I also conduct an empirical analysis of Information Technology firms in Korea. Finally, exGPD shows better performance especially for PBR, suggesting that exGPD could be an alternative for GPD for the analysis of financial ratios.
https://doi.org/10.15813/kmr.2021.22.2.013 인용 PDF KSCI

Comparison Study of Parameter Estimation Methods for Some Extreme Value Distributions (Focused on the Regression Method) (극단치 분포의 모수 추정방법 비교 연구(회귀 분석법을 기준으로))

Woo, Ji-Yong;Kim, Myung-Suk
- Communications for Statistical Applications and Methods
- /
- v.16 no.3
- /
- pp.463-477
- /
- 2009
Parameter estimation methods such as maximum likelihood estimation method, probability weighted moments method, regression method have been popularly applied to various extreme value models in numerous literature. Among three methods above, the performance of regression method has not been rigorously investigated yet. In this paper the regression method is compared with the other methods via Monte Carlo simulation studies for estimation of parameters of the Generalized Extreme Value(GEV) distribution and the Generalized Pareto(GP) distribution. Our simulation results indicate that the regression method tends to outperform other methods under small samples by providing smaller biases and root mean square errors for estimation of location parameter of the GEV model. For the scale parameter estimation of the GP model under small samples, the regression method tends to report smaller biases than the other methods. The regression method tends to be superior to other methods for the shape parameter estimation of the GEV model and GP model when the shape parameter is -0.4 under small and moderately large samples.
https://doi.org/10.5351/CKSS.2009.16.3.463 인용 PDF KSCI

An Analysis of Daily Maximum Traffic Accident Using Generalized Extreme Value Distribution (일반화 극단치분포를 이용한 일 최대 교통사고 분석)

Kim, Junseok;Kim, Daesung;Yoon, Sanghoo
- Journal of Digital Convergence
- /
- v.18 no.10
- /
- pp.33-39
- /
- 2020
In order to cope with traffic accidents efficiently, the maximum number of traffic accidents, deaths and serious injuries that can occur during the day should be presented quantitatively. In order to examine the characteristics of traffic accidents in different regions, it was divided into the Seoul metropolitan area, Chungcheong area, Gyeongbuk area, Honam area, and Gyeongnam area and was suitable for the generalized extreme value distribution (GEV). The parameters of the GEV distribution were estimated by the L-moments, and the Anderson-Darling test and the Cramer-von Mises test confirmed the suitability of the distribution. According to the analysis, the maximum number of traffic accidents that can occur once every 50 years is 401 in the Seoul metropolitan area, 168 in the South Gyeongsang region, 455 in the North Gyeongsang region, 136 in the Chungcheong region and 205 in the South Jeolla region. Compared to the Seoul metropolitan area, which has a large population and car registration, the number of traffic accidents is relatively high due to the large area, mountainous areas, and logistics movement caused by the industrial complex.
https://doi.org/10.14400/JDC.2020.18.10.033 인용 PDF KSCI

A Bayesian Analysis of Return Level for Extreme Precipitation in Korea (한국지역 집중호우에 대한 반환주기의 베이지안 모형 분석)

Lee, Jeong Jin;Kim, Nam Hee;Kwon, Hye Ji;Kim, Yongku
- The Korean Journal of Applied Statistics
- /
- v.27 no.6
- /
- pp.947-958
- /
- 2014
Understanding extreme precipitation events is very important for flood planning purposes. Especially, the r-year return level is a common measure of extreme events. In this paper, we present a spatial analysis of precipitation return level using hierarchical Bayesian modeling. For intensity, we model annual maximum daily precipitations and daily precipitation above a high threshold at 62 stations in Korea with generalized extreme value(GEV) and generalized Pareto distribution(GPD), respectively. The spatial dependence among return levels is incorporated to the model through a latent Gaussian process of the GEV and GPD model parameters. We apply the proposed model to precipitation data collected at 62 stations in Korea from 1973 to 2011.
https://doi.org/10.5351/KJAS.2014.27.6.947 인용 PDF KSCI

A Study on the Comovement of Industry Default (산업 부도의 동조화 현상 연구)

Jeon, Haehyun;Kim, So-Yeun;Kim, Changki
- The Korean Journal of Applied Statistics
- /
- v.28 no.6
- /
- pp.1289-1312
- /
- 2015
This paper studies the comovement of industry defaults among listed companies. Rank correlation coefficients of Spearman's ${\rho}$ and Kendall's ${\tau}$ measure the concordance of default. These non-parametric coefficients do not require distributional assumptions and are easily used even with less data and extreme values. This study predicts a future financial crisis by looking at the comovement of industry defaults. We expect our analyses will aid market participants (including company executives) in making investment or risk management decisions.
https://doi.org/10.5351/KJAS.2015.28.6.1289 인용 PDF KSCI

Drouhgt Frequency Analysis for Effective Drought Index using Boundary Kernel Function (경계핵밀도함수를 이용한 Effective Drought Index 지수의 가뭄빈도해석)

Oh, Tae-Suk;Moon, Young-Il;Kwon, Hyun-Han;Kim, Seong-Sil
- Proceedings of the Korea Water Resources Association Conference
- /
- 2010.05a
- /
- pp.1775-1779
- /
- 2010
최근의 지구온난화에 따른 기후변화로 인하여 홍수와 가뭄과 같은 극한 사상의 발생 빈도가 증가하고 있는 추세이다. 특히, 가뭄은 장기간에 걸쳐 피해를 유발시키는 대표적인 자연재해 중의 하나이다. 따라서 본 연구에서는 가뭄의 크기와 정도를 정량화 할 수 있는 가뭄빈도해석을 수행하였다. 가뭄빈도해석을 위하여 우리나라의 61개 지점을 대상으로 EDI 가뭄지수를 산정하였다. 일별로 산정된 EDI 지수를 이용하여 연도별로 최저값을 추출하였다. 추출된 EDI 자료를 이용하여 빈도해석을 수행하였다. 빈도해석은 복합 확률 분포형 등의 장점을 갖고 있는 경계핵밀도함수를 이용하여 수행하였다. 분석 결과에서 재현기간 5년 내지 10년에서 극단적으로 건조함을 나타내는 가뭄지수인 -2.0 이하의 값을 갖는 것으로 나타났다. 따라서 가뭄은 평균적으로 재현기간 5년에서 10년 사이에 반복적으로 발생할 수 있다. 그러므로 가뭄에 대한 지속적인 모니터링 시스템의 구축과 가뭄피해를 최소화 할 수 있도록 해야 한다.
PDF

Search Result 24, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)