• Title/Summary/Keyword: Skewed Data

Search Result 206, Processing Time 0.025 seconds

An Enhanced Two Dimensional Histogram Method Utilizing Dense Regions (고 밀도 영역을 이용한 향상된 2차원 히스토그램 기법)

  • Roh, Yo-Han;Chung, Yon-Dohn;Ghim, Ho-Jin;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.544-554
    • /
    • 2008
  • Histograms are popularly used for selectivity estimation in database systems. In conventional histogram methods, buckets return the approximated results based on the assumption that all objects in a bucket are uniformly distributed. However, the objects within the region of a query are not likely to be uniformly distributed. That is, there can be some skews (i.e., clusters) in the buckets, which may significantly degrade the accuracy of the histogram. The aim of this work is to enhance the accuracy of histograms. For this purpose, we propose a new two-dimensional histogram method considering clusters. The proposed method detects dense regions and exploits them for organizing buckets. Since the proposed method effectively reduces accuracy degradation caused by clusters, it can provide improved, robust accuracy against skewed data distributions. Through experiments, we show that the proposed method provides up to 74% improved performance compared with the conventional histogram.

A Study on the Rating of the Insureds' Anthropometric Data I. Build (피보험체계측치(被保險體計測値)의 평가(評價)에 관한 연구(硏究) 제1보(第1報) 체격(體格))

  • Im, Young-Hoon
    • The Journal of the Korean life insurance medical association
    • /
    • v.3 no.1
    • /
    • pp.103-141
    • /
    • 1986
  • The present study was undertaken to establish the decision standard of builds for the insured by using the ratio of weight-for-height as build index. Materials being examined were the ratio of weight-for-height being calculated from the actually measured heights and weights of a total of 15,838 insured persons who were examined medically at Honam Medical Department of Dong Bang Life Insurance Company, Ltd. from June, 1979 to September, 1985. The ratio of weight-for-height is calculated by the following formula. The ratio of weight-for-height(%)=$\frac{weight(kg){\times}100}{\{height(cm)-100\}{\times}0.9(kg)$ The results were as follows: 1. The distribution of the ratio of weight. for-height of the 15,838 insureds follows Log normal distribution being skewed to the left(the direction of underweight). 2. The ratio of weight-for-height were Log transformed to lead to a sym metrical pattern of distribution in which statistical rules are known to be applied more exactly. Thereafter, the establishment of dicision standard of builds was undertaken by using the log of the ratio of weight-for-height as build index. Through all ages in male, the ratio of weight-for-height indicating the range of standard lives including slight overweighted and underweighted lives besides normal lives is 80-130%, and corresponds to $"M-2{\delta}"-"M+1.5{\delta}"$ and to $M{\pm}20%$ ; in female, 85-135%, and corresponds to $"M-2{\delta}"-"M+1.5{\delta}"$ and to $M{\pm}20%$. Through all ages in male, the ratio of weight-for-height indicating the initial level of super-overweighted and super-underweighted lives is 130-150% and 75-80%,and corresponds to $M+3{\delta}\;and\;M-3{\delta}$ and to M+40% and M-25% respectively;in female, 140-160% and 75-80%, and corresponds to $M+3{\delta}\;and\;M-3{\delta}$ and to M+40%-+50% and M-25% respectively. 3. Author's rating table model for builds(a table of weight per height) is proposed. On the table, the ratings for builds, i. e. standard, super-weighted and super-underweighted lives, are listed.

  • PDF

The Decline of Health-Related Quality of Life Associated with Some Diseases in Korean Adults (우리나라 성인에서 일부 질환과 연관된 건강관련 삶의 질 감소)

  • Kil, Seol-Ryoung;Lee, Sang-Il;Yun, Sung-Cheol;An, Hyung-Mi;Jo, Min-Woo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.41 no.6
    • /
    • pp.434-441
    • /
    • 2008
  • Objectives: This study was conducted to measure the decline in the health-related quality of life (HRQoL) associated with some diseases in South Korean adults. Methods: The EQ-5D health states in the 2005 National Health and Nutrition Examination Survey (NHNES) and the Korean EQ-5D valuation set were used to obtain the EQ-5D indexes of the study subjects. Each disease group was defined when the subjects reported to the NHNES that they were diagnosed with the corresponding disease during the previous 1 year by physicians. Since the distributions of the EQ-5D indexes in each subgroup were negatively skewed, median regression analysis was used to estimate the effects of specific diseases on the HRQoL. Median regression analysis produced estimates that approximated the median of the EQ-5D indexes and there are more robust for analyzing data with many outliers. Results: A total of 16,692 subjects (6,667 patients and 10,025 people without any disease) were included in the analysis. As a result of the median regression analysis, stroke had the strongest impact on the HRQoL for both males and females, followed by osteoporosis, osteoarthritis, rheumatic arthritis, and herniation of an intervertebral disc. While asthma had a significant impact on the HRQoL only in men, cataract, temporo-mandibular dysfunction, and peptic ulcer significantly affected the HRQoL only in women. Conclusions: Stroke and musculoskeletal diseases were associated with the largest losses of the HRQoL in Korean adults.

A Study on Statistical Forecasting Models of PM10 in Pohang Region by the Variable Transformation (변수변환을 통한 포항지역 미세먼지의 통계적 예보모형에 관한 연구)

  • Lee, Yung-Seop;Kim, Hyun-Goo;Park, Jong-Seok;Kim, Hee-Kyung
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.22 no.5
    • /
    • pp.614-626
    • /
    • 2006
  • Using the data of three environmental monitoring sites in Pohang area(KME112, KME113, and KME114), statistical forecasting models of the daily maximum and mean values of PM10 have been developed. Since the distributions of the daily maximum and mean PM10 values are skewed, which are similar to the Weibull distribution, these values were log-transformed to increase prediction accuracy by approximating the normal distribution. Three statistical forecasting models, which are regression, neural networks(NN) and support vector regression(SVR), were built using the log-transformed response variables, i.e., log(max(PM10)) or log(mean (PM10)). Also, the forecasting models were validated by the measure of RMSE, CORR, and IOA for the model comparison and accuracy. The improvement rate of IOA before and after the log-transformation in the daily maximum PM10 prediction was 12.7% for the regression and 22.5% for NN. In particular, 42.7% was improved for SVR method. In the case of the daily mean PM10 prediction, IOA value was improved by 5.1% for regression, 6.5% for NN, and 6.3% for SVR method. As a conclusion, SVR method was found to be performed better than the other methods in the point of the model accuracy and fitness views.

LES for Turbulent Duct Flow with Mass injection (덕트내부에서 질량분사가 있는 난류유동의 LES 해석)

  • Kim, Bo-Hoon;Na, Yang;Lee, Chang-Jin
    • Proceedings of the Korean Society of Propulsion Engineers Conference
    • /
    • 2010.05a
    • /
    • pp.210-213
    • /
    • 2010
  • Recent experimental data shows that the noticeable feature of irregular roughened spots on the fuel surface occurs during the combustion test. The generation of these unexpected patterns is likely to be resulted from the disturbed boundary layer due caused by wall blowing which is intended to simulate the process of fuel vaporization. LES without chemical reaction was conducted to investigate the flow characteristics at the near-fuel surface and the behavior of turbulent structures which is evolved by the wall blowing at the Reynolds number of 23,000. Cylindrical geometry was considered to get the most reality of the calculation results because real hybrid rocket motor is circular grain configuration. It was shown that the wall blowing pushed turbulent structures upwards making them tilted and this skewed displacement, in effect, left the foot prints of the structures on the surface. This change of kinematics may explain the formation of irregular isolated spots on the fuel surface observed in the experiment.

  • PDF

Estimating Price Elasticity of Residential Water Demand in Korea Using Panel Quatile Model (패널 분위수회귀 모형을 사용한 우리나라 지방 상수도 생활용수 수요의 가격탄력성 추정)

  • Kim, Hyung-Gun
    • Environmental and Resource Economics Review
    • /
    • v.27 no.1
    • /
    • pp.195-214
    • /
    • 2018
  • This study estimates the price elasticity of residential water demand in Korea. For that, annual panel data from the year of 2010 to 2013 for 161 local water services is estimated by using panel quantile model. As a result, the price elasticities of residental water demand in Korea are estimated to be between -0.156 and -0.189 depending on its quantile. In addition, the study finds that the estimated elasticity of residential water demand by traditional conditional mean regression is relatively more influenced by high demand areas because the distribution of residental water demand in Korea is left-skewed.

Selectivity Estimation Using Compressed Spatial Histogram (압축된 공간 히스토그램을 이용한 선택율 추정 기법)

  • Chi, Jeong-Hee;Lee, Jin-Yul;Kim, Sang-Ho;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.11D no.2
    • /
    • pp.281-292
    • /
    • 2004
  • Selectivity estimation for spatial query is very important process used in finding the most efficient execution plan. Many works have been performed to estimate accurate selectivity. Although they deal with some problems such as false-count, multi-count, they can not get such effects in little memory space. Therefore, we propose a new technique called MW Histogram which is able to compress summary data and get reasonable results and has a flexible structure to react dynamic update. Our method is based on two techniques : (a) MinSkew partitioning algorithm which deal with skewed spatial datasets efficiently (b) Wavelet transformation which compression effect is proven. The experimental results showed that the MW Histogram which the buckets and wavelet coefficients ratio is 0.3 is lower relative error than MinSkew Histogram about 5%-20% queries, demonstrates that MW histogram gets a good selectivity in little memory.

A Study on Prediction Model for Laundry and Toilet Water-use demand (세탁기 및 화장실 용수 수요량에 대한 예측모형 연구)

  • Myoung, Sung-Min
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.4
    • /
    • pp.327-335
    • /
    • 2019
  • This study develops a prediction model for toilet and laundry water end-uses based on surveyed data which measured housing and household characteristics of 140 households over 5 years in Korea. Classical regression model assuming a normal distribution was not appropriate and estimated parameters were biased, because the distribution of measured water-uses was left-skewed. As an alternative to this problem, we considered the distribution of weibull and lognormal for each water-uses, and three regression models were compared using log-likelihood and scale parameter. As a result, weibull regression were chosen to be appropriate for both water-uses and also presented the factors that affect each water-use. This results expect that an insight is provided on water resources utilization and theoretical support role for effective water resource management.

Statistical Analysis on Water Quality Characteristics of Large Lakes in Korea (우리나라 주요 호소의 수질특성에 대한 통계적 분석)

  • Kong, Dongsoo
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.2
    • /
    • pp.165-180
    • /
    • 2019
  • Water quality data of 81 lakes in Korea, 2013 ~ 2017 were analyzed. Most water quality parameters showed left-skewed distribution, while dissolved oxygen showed normal distribution. pH and dissolved oxygen showed a positive correlation with organic matter and nutrients, which appeared to be a nonsense correlation mediated by the algae. The ratio of $BOD_5$ and $COD_{Mn}$ to CBOD was 21 % and 52 % in the freshwater lakes, respectively. TOC concentration appeared to be underestimated by the UV digestion method, when salinity exceeds $700{\mu}S\;cm^{-1}$. In terms of nitrogen/phosphorus ratio, the limiting factor for algal growth seemed to be phosphorus in most of the lakes. Chlorophyll ${\alpha}$ increased acutely with decrease of N/P ratio. However, it seemed to be a nonsense correlation mediated by phosphorus concentration, since the N/P ratio depended on phosphorus. The N/P ratio of brackish lakes was lower than that of the freshwater, at the same concentration of phosphorus. It is worth examining denitrification that occurs, in bottom layer and sediment, during saline stratification. $Chl.{\alpha}$ concentration decreased in the form of a power function with increase of mean depth. The primary reason is that deep lakes are mainly at the less-disturbed upstream. However, it is necessary to investigate the effect of sediment, on water quality in shallow lakes. Light attenuation in the upper layer, was dominated by tripton (non-algal suspended solids) absorption/scattering (average relative contribution of 39 %), followed by CDOM (colored dissolved organic matter) (average 37 %) and $Chl.{\alpha}$ (average 21 %).

Indian Research on Artificial Neural Networks: A Bibliometric Assessment of Publications Output during 1999-2018

  • Gupta, B.M.;Dhawan, S.M.
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.10 no.4
    • /
    • pp.29-46
    • /
    • 2020
  • The paper describes the quantitative and qualitative dimensions of artificial neural networks (ANN) in India in the global context. The study is based on research publications data (8260) as covered in the Scopus database during 1999-2018. ANN research in India registered 24.52% growth, averaged 11.95 citations per paper, and contributed 9.77% share to the global ANN research. ANN research is skewed as the top 10 countries account for 75.15% of global output. India ranks as the third most productive country in the world. The distribution of research by type of ANN networks reveals that Feed Forward Neural Network type accounted for the highest share (10.18% share), followed by Adaptive Weight Neural Network (5.38% share), Feed Backward Neural Network (2.54% share), etc. ANN research applications across subjects were the largest in medical science and environmental science (11.82% and 10.84% share respectively), followed by materials science, energy, chemical engineering and water resources (from 6.36% to 9.12%), etc. The Indian Institute of Technology, Kharagpur and the Indian Institute of Technology, Roorkee lead the country as the most productive organizations (with 289 and 264 papers). Besides, the Indian Institute of Technology, Kanpur (33.04 and 2.76) and Indian Institute of Technology, Madras (24.26 and 2.03) lead the country as the most impactful organizations in terms of citation per paper and relative citation index. P. Samui and T.N. Singh have been the most productive authors and G.P.S.Raghava (86.21 and 7.21) and K.P. Sudheer (84.88 and 7.1) have been the most impactful authors. Neurocomputing, International Journal of Applied Engineering Research and Applied Soft Computing topped the list of most productive journals.