• Title/Summary/Keyword: Data imputation

Search Result 199, Processing Time 0.022 seconds

Association of HLA Genotype and Fulminant Type 1 Diabetes in Koreans

  • Kwak, Soo Heon;Kim, Yoon Ji;Chae, Jeesoo;Lee, Cue Hyunkyu;Han, Buhm;Kim, Jong-Il;Jung, Hye Seung;Cho, Young Min;Park, Kyong Soo
    • Genomics & Informatics
    • /
    • v.13 no.4
    • /
    • pp.126-131
    • /
    • 2015
  • Fulminant type 1 diabetes (T1DM) is a distinct subtype of T1DM that is characterized by rapid onset hyperglycemia, ketoacidosis, absolute insulin deficiency, and near normal levels of glycated hemoglobin at initial presentation. Although it has been reported that class II human leukocyte antigen (HLA) genotype is associated with fulminant T1DM, the genetic predisposition is not fully understood. In this study we investigated the HLA genotype and haplotype in 11 Korean cases of fulminant T1DM using imputation of whole exome sequencing data and compared its frequencies with 413 participants of the Korean Reference Panel. The $HLA-DRB1^*04:05-HLA-DQB1^*04:01$ haplotype was significantly associated with increased risk of fulminant T1DM in Fisher's exact test (odds ratio [OR], 4.11; 95% confidence interval [CI], 1.56 to 10.86; p = 0.009). A histidine residue at $HLA-DR{\beta}1$ position 13 was marginally associated with increased risk of fulminant T1DM (OR, 2.45; 95% CI, 1.01 to 5.94; p = 0.054). Although we had limited statistical power, we provide evidence that HLA haplotype and amino acid change can be a genetic risk factor of fulminant T1DM in Koreans. Further large-scale research is required to confirm these findings.

Improvement of A Preprocessing of Archived Traffic Data Collected by Expressway Vehicle Detection System (고속도로 차량검지기 이력자료 활용을 위한 전처리과정 개선)

  • Lee, Hwan-Pil;NamKoong, Seong;Kim, Soo-Hee;Kim, Jin
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.12 no.1
    • /
    • pp.15-27
    • /
    • 2013
  • While the vehicle detector is collected from a variety of information was mainly used as a real-time data. Recently scheme of application for archived traffic data has become increasingly important. In this background, this research were conducted on the improvement of the preprocessing for archived traffic data application. The purpose of improving specific preprocessing was reflect transportation phenomena by traffic data. As evaluation result, improvement preprocessing was close to the actual value than exist preprocessing.

The Comparison of Imputation Methods in Space Time Series Data with Missing Values (공간시계열모형의 결측치 추정방법 비교)

  • Lee, Sung-Duck;Kim, Duck-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.2
    • /
    • pp.263-273
    • /
    • 2010
  • Missing values in time series can be treated as unknown parameters and estimated by maximum likelihood or as random variables and predicted by the conditional expectation of the unknown values given the data. The purpose of this study is to impute missing values which are regarded as the maximum likelihood estimator and random variable in incomplete data and to compare with two methods using ARMA and STAR model. For illustration, the Mumps data reported from the national capital region monthly over the years 2001~2009 are used, and estimate precision of missing values and forecast precision of future data are compared with two methods.

The Comparison of Imputation Methods in Time Series Data with Missing Values (시계열자료에서 결측치 추정방법의 비교)

  • Lee, Sung-Duck;Choi, Jae-Hyuk;Kim, Duck-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.4
    • /
    • pp.723-730
    • /
    • 2009
  • Missing values in time series can be treated as unknown parameters and estimated by maximum likelihood or as random variables and predicted by the expectation of the unknown values given the data. The purpose of this study is to impute missing values which are regarded as the maximum likelihood estimator and random variable in incomplete data and to compare with two methods using ARMA model. For illustration, the Mumps data reported from the national capital region monthly over the years 2001 ${\sim}$ 2006 are used, and results from two methods are compared with using SSF(Sum of square for forecasting error).

Methods for screening time series data according to data quality and statistical status (품질 및 조건 기반 시계열 데이터 선별 활용 방법)

  • Moon, JaeWon;Yu, MiSeon;Oh, SeungTaek;Kum, SeungWoo;Hwang, JiSoo;Lee, JiHoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.01a
    • /
    • pp.399-402
    • /
    • 2022
  • 본 논문에서는 불완전한 시계열 데이터를 활용하기 전 데이터를 선별하여 활용하는 방법을 소개한다. 시계열 데이터의 품질은 수집 네트워크와 수집 기기의 시간적 변화와 같은 가변적 상황에 의존적이므로 불규칙적으로 이상 혹은 누락 데이터가 발생한다. 이때 에러를 포함하였다는 이유로 일괄적으로 데이터를 제거하여 활용하지 않거나, 혹은 누락 데이터의 구간을 조건 없이 복원하여 활용한다면 원하지 않는 결과를 초래할 수 있다. 제안하는 방법은 시계열 데이터의 구간에 대한 누락 데이터의 통계적 정보를 축출하고 이에 기반하여 활용 목적과 활용 가능한 품질의 기준에 부합하지 않는다면 활용 불가능한 데이터라고 판별하고 미리 분석 등의 데이터 활용 시 자동 제외하는 구조를 제안하고 실험하였다. 제안하는 방법은 활용 목적과 상황에 적응적으로 누락 값을 포함하는 데이터의 빠른 활용 판단이 가능하며 보다 나은 분석 결과를 얻을 수 있다.

  • PDF

Data Cleansing Algorithm for reducing Outlier (데이터 오·결측 저감 정제 알고리즘)

  • Lee, Jongwon;Kim, Hosung;Hwang, Chulhyun;Kang, Inshik;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.342-344
    • /
    • 2018
  • This paper shows the possibility to substitute statistical methods such as mean imputation, correlation coefficient analysis, graph correlation analysis for the proposed algorithm, and replace statistician for processing various abnormal data measured in the water treatment process with it. In addition, this study aims to model a data-filtering system based on a recent fractile pattern and a deep learning-based LSTM algorithm in order to improve the reliability and validation of the algorithm, using the open-sourced libraries such as KERAS, THEANO, TENSORFLOW, etc.

  • PDF

Household, personal, and financial determinants of surrender in Korean health insurance

  • Shim, Hyunoo;Min, Jung Yeun;Choi, Yang Ho
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.5
    • /
    • pp.447-462
    • /
    • 2021
  • In insurance, the surrender rate is an important variable that threatens the sustainability of insurers and determines the profitability of the contract. Unlike other actuarial assumptions that determine the cash flow of an insurance contract, however, it is characterized by endogenous variables such as people's economic, social, and subjective decisions. Therefore, a microscopic approach is required to identify and analyze the factors that determine the lapse rate. Specifically, micro-level characteristics including the individual, demographic, microeconomic, and household characteristics of policyholders are necessary for the analysis. In this study, we select panel survey data of Korean Retirement Income Study (KReIS) with many diverse dimensions to determine which variables have a decisive effect on the lapse and apply the lasso regularized regression model to analyze it empirically. As the data contain many missing values, they are imputed using the random forest method. Among the household variables, we find that the non-existence of old dependents, the existence of young dependents, and employed family members increase the surrender rate. Among the individual variables, divorce, non-urban residential areas, apartment type of housing, non-ownership of homes, and bad relationship with siblings increase the lapse rate. Finally, among the financial variables, low income, low expenditure, the existence of children that incur child care expenditure, not expecting to bequest from spouse, not holding public health insurance, and expecting to benefit from a retirement pension increase the lapse rate. Some of these findings are consistent with those in the literature.

An Evaluation System For Freeway Traffic Data Processing Techniques (고속도로 교통자료 처리기법 통합평가 시스템 개발)

  • Oh, Dong-Wook;Oh, Cheol;NamKoong, Sung;Jeon, Se-Kil
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.7 no.4
    • /
    • pp.13-24
    • /
    • 2008
  • Real-time traffic data are readily obtainable by traffic surveillance systems of intelligent transportation systems (ITS). Such data greatly support further applications in the field of traffic operations, planning, and safety. However, traffic data should be appropriately processed to fully exploit the benefits of data collection capability. Rather than developing individual data processing techniques, which is major concern of existing studies, this study proposes a novel methodology for evaluating data processing techniques in an integrated manner. Also, a tool for implementing the proposed methodology is developed. Users can extract useful and more reliable traffic data based upon their ultimate purpose of data usage by the evaluation tool developed in this study. Actual freeway traffic data are, as an example, fed into the evaluation tool, and results are discussed.

  • PDF

A Study of Labor Entry of Conditional Welfare Recipients : An Exploration of the Predictors (취업대상 조건부수급자의 경제적 자활로의 진입에 영향을 미치는 요인에 관한 연구)

  • Kim, Kyo-Seong;Kang, Chul-Hee
    • Korean Journal of Social Welfare
    • /
    • v.52
    • /
    • pp.5-32
    • /
    • 2003
  • This paper examines the labor entry of conditional welfare recipients. This paper focuses on two questions. First, what is the percentage of conditional welfare recipients who have labor entry? Second, what are the predictors in the labor entry and the duration to the entry? Using Data about 917 welfare recipients who participated in the self-sufficiency programs of the Offices for Secure Employment in Seoul, this paper attempts to answer the above questions. Logistic regression analysis and survival analysis are adopted to identify variables predicting labor entry of conditional welfare recipients. This paper also utilizes a multiple imputation method to deal with the limitation of data by the missing values in some variables. The major findings are as follows: about 43.8% of the conditional welfare recipients have successful labor entry; and in the labor entry and the duration to the entry, gender, household, information and referral services for employment, health and willingness for self-sufficiency are the predictors that are statistically significant. Among these variables, health and willingness for self-sufficiency are more noticeable; it is recognized that programs to care for health of welfare recipients who want to have the labor entry and counseling programs to strengthen welfare recipients' willingness for labor entry are very important for them to be successful in the labor entry. This paper provides a basic knowledge about realities of the conditional welfare recipients' labor entry, identifies research areas for further research, and develops policy implications for their self-sufficiency.

  • PDF

An Intelligent Framework for Feature Detection and Health Recommendation System of Diseases

  • Mavaluru, Dinesh
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.177-184
    • /
    • 2021
  • All over the world, people are affected by many chronic diseases and medical practitioners are working hard to find out the symptoms and remedies for the diseases. Many researchers focus on the feature detection of the disease and trying to get a better health recommendation system. It is necessary to detect the features automatically to provide the most relevant solution for the disease. This research gives the framework of Health Recommendation System (HRS) for identification of relevant and non-redundant features in the dataset for prediction and recommendation of diseases. This system consists of three phases such as Pre-processing, Feature Selection and Performance evaluation. It supports for handling of missing and noisy data using the proposed Imputation of missing data and noise detection based Pre-processing algorithm (IMDNDP). The selection of features from the pre-processed dataset is performed by proposed ensemble-based feature selection using an expert's knowledge (EFS-EK). It is very difficult to detect and monitor the diseases manually and also needs the expertise in the field so that process becomes time consuming. Finally, the prediction and recommendation can be done using Support Vector Machine (SVM) and rule-based approaches.