• Title/Summary/Keyword: 결측

Search Result 428, Processing Time 0.03 seconds

Determination of the Optimal Aggregation Interval Size of Individual Vehicle Travel Times Collected by DSRC in Interrupted Traffic Flow Section of National Highway (국도 단속류 구간에서 DSRC를 활용하여 수집한 개별차량 통행시간의 최적 수집 간격 결정 연구)

  • PARK, Hyunsuk;KIM, Youngchan
    • Journal of Korean Society of Transportation
    • /
    • v.35 no.1
    • /
    • pp.63-78
    • /
    • 2017
  • The purpose of this study is to determine the optimal aggregation interval to increase the reliability when estimating representative value of individual vehicle travel time collected by DSRC equipment in interrupted traffic flow section in National Highway. For this, we use the bimodal asymmetric distribution data, which is the distribution of the most representative individual vehicle travel time collected in the interrupted traffic flow section, and estimate the MSE(Mean Square Error) according to the variation of the aggregation interval of individual vehicle travel time, and determine the optimal aggregation interval. The estimation equation for the MSE estimation utilizes the maximum estimation error equation of t-distribution that can be used in asymmetric distribution. For the analysis of optimal aggregation interval size, the aggregation interval size of individual vehicle travel time was only 3 minutes or more apart from the aggregation interval size of 1-2 minutes in which the collection of data was normally lost due to the signal stop in the interrupted traffic flow section. The aggregation interval that causes the missing part in the data collection causes another error in the missing data correction process and is excluded. As a result, the optimal aggregation interval for the minimum MSE was 3~5 minutes. Considering both the efficiency of the system operation and the improvement of the reliability of calculation of the travel time, it is effective to operate the basic aggregation interval as 5 minutes as usual and to reduce the aggregation interval to 3 minutes in case of congestion.

Additive hazards models for interval-censored semi-competing risks data with missing intermediate events (결측되었거나 구간중도절단된 중간사건을 가진 준경쟁적위험 자료에 대한 가산위험모형)

  • Kim, Jayoun;Kim, Jinheum
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.539-553
    • /
    • 2017
  • We propose a multi-state model to analyze semi-competing risks data with interval-censored or missing intermediate events. This model is an extension of the three states of the illness-death model: healthy, disease, and dead. The 'diseased' state can be considered as the intermediate event. Two more states are added into the illness-death model to incorporate the missing events, which are caused by a loss of follow-up before the end of a study. One of them is a state of the lost-to-follow-up (LTF), and the other is an unobservable state that represents an intermediate event experienced after the occurrence of LTF. Given covariates, we employ the Lin and Ying additive hazards model with log-normal frailty and construct a conditional likelihood to estimate transition intensities between states in the multi-state model. A marginalization of the full likelihood is completed using adaptive importance sampling, and the optimal solution of the regression parameters is achieved through an iterative quasi-Newton algorithm. Simulation studies are performed to investigate the finite-sample performance of the proposed estimation method in terms of empirical coverage probability of true regression parameters. Our proposed method is also illustrated with a dataset adapted from Helmer et al. (2001).

Traffic Correction System Using Vehicle Axles Counts of Piezo Sensors (피에조센서의 차량 축 카운트를 활용한 교통량보정시스템)

  • Jung, Seung-Weon;Oh, Ju-Sam
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.1
    • /
    • pp.277-283
    • /
    • 2021
  • Traffic data by vehicle classification are important data used as basic data in various fields such as road and traffic design. Traffic data is collected through permanent and temporary surveys and is provided as an annual average daily traffic (AATD) in the statistical yearbook of road traffic. permanent surveys are collected through traffic collection equipment (AVC), and the AVC consists of a loop sensor that detects traffic volume and a piezo sensor that detects the number of axes. Due to the nature of the buried type of traffic collection equipment, missing data is generated due to failure of detection equipment. In the existing method, it is corrected through historical data and the trend of traffic around the point. However, this method has a disadvantage in that it does not reflect temporal and spatial characteristics and that the existing data used for correction may also be a correction value. In this study, we proposed a method to correct the missing traffic volume by calculating the axis correction coefficient through the accumulated number of axes acquired by using a piezo sensor that can detect the axis of the vehicle. This has the advantage of being able to reflect temporal and spatial characteristics, which are the limitations of the existing methods, and as a result of comparative evaluation, the error rate was derived lower than that of the existing methods. The traffic volume correction system using axis count is judged as a correction method applicable to the field system with a simple algorithm.

Spatial Gap-filling of GK-2A/AMI Hourly AOD Products Using Meteorological Data and Machine Learning (기상모델자료와 기계학습을 이용한 GK-2A/AMI Hourly AOD 산출물의 결측화소 복원)

  • Youn, Youjeong;Kang, Jonggu;Kim, Geunah;Park, Ganghyun;Choi, Soyeon;Lee, Yangwon
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.953-966
    • /
    • 2022
  • Since aerosols adversely affect human health, such as deteriorating air quality, quantitative observation of the distribution and characteristics of aerosols is essential. Recently, satellite-based Aerosol Optical Depth (AOD) data is used in various studies as periodic and quantitative information acquisition means on the global scale, but optical sensor-based satellite AOD images are missing in some areas with cloud conditions. In this study, we produced gap-free GeoKompsat 2A (GK-2A) Advanced Meteorological Imager (AMI) AOD hourly images after generating a Random Forest based gap-filling model using grid meteorological and geographic elements as input variables. The accuracy of the model is Mean Bias Error (MBE) of -0.002 and Root Mean Square Error (RMSE) of 0.145, which is higher than the target accuracy of the original data and considering that the target object is an atmospheric variable with Correlation Coefficient (CC) of 0.714, it is a model with sufficient explanatory power. The high temporal resolution of geostationary satellites is suitable for diurnal variation observation and is an important model for other research such as input for atmospheric correction, estimation of ground PM, analysis of small fires or pollutants.

Development of Machine Learning Based Precipitation Imputation Method (머신러닝 기반의 강우추정 방법 개발)

  • Heechan Han;Changju Kim;Donghyun Kim
    • Journal of Wetlands Research
    • /
    • v.25 no.3
    • /
    • pp.167-175
    • /
    • 2023
  • Precipitation data is one of the essential input datasets used in various fields such as wetland management, hydrological simulation, and water resource management. In order to efficiently manage water resources using precipitation data, it is essential to secure as much data as possible by minimizing the missing rate of data. In addition, more efficient hydrological simulation is possible if precipitation data for ungauged areas are secured. However, missing precipitation data have been estimated mainly by statistical equations. The purpose of this study is to propose a new method to restore missing precipitation data using machine learning algorithms that can predict new data based on correlations between data. Moreover, compared to existing statistical methods, the applicability of machine learning techniques for restoring missing precipitation data is evaluated. Representative machine learning algorithms, Artificial Neural Network (ANN) and Random Forest (RF), were applied. For the performance of classifying the occurrence of precipitation, the RF algorithm has higher accuracy in classifying the occurrence of precipitation than the ANN algorithm. The F1-score and Accuracy values, which are evaluation indicators of the classification model, were calculated as 0.80 and 0.77, while the ANN was calculated as 0.76 and 0.71. In addition, the performance of estimating precipitation also showed higher accuracy in RF than in ANN algorithm. The RMSE of the RF and ANN algorithms was 2.8 mm/day and 2.9 mm/day, and the values were calculated as 0.68 and 0.73.

A Study on Hydraulic Stability of Stream Reach using GIS (GIS 기반의 하천의 수리학적 안정성 분석 연구)

  • Park, Min-Ji;Park, Geun-Ae;Kim, Seong-Joon
    • 한국공간정보시스템학회:학술대회논문집
    • /
    • 2005.11a
    • /
    • pp.245-250
    • /
    • 2005
  • 현재까지 국내에서 2차원 흐름 해석을 위한 SMS의 RMA-2는 합류지점이나 만곡수로 내의 유속분포, 수위변동 등에 관한 연구가 대부분이었다. 그러나 경계조건으로 입력되어지는 하천의 유입량과 유출수위는 실측이 어려우며 자료의 특성상 결측값도 많아 실제와는 다르게 대부분의 연구에서 정상류로 모의되었다. 본 연구에서는 WMS의 HEC-1 프로그램을 이용하여 RMA를 부정상류로 모의하였다. 모의결과 실제 하천은 빈도별 지류의 영향을 받아 유속이 0.05에서 0.46까지 변화하였다.

  • PDF

Monte Carlo Random Permutation Tests for Incompletely Ranked Data (불완전 순위 자료를 위한 몬테칼로 임의순열 검정)

  • Huh, Myung-Hoe;Choi, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.191-199
    • /
    • 2001
  • 본 소고는 n명의 심사자가 k개의 객체를 평가하여 얻어진 불완전 순위자료에서 객체간 선호도에 있어 차이가 없다는 영가설을 검정하는 방법에 관한 연구이다. 주어진 자료에서 결측값들을 다중대체하는 방식을 제안하고 이들을 평균 p-값으로 묶는 몬테칼로방식의 임의순열 검정을 제안한다.

  • PDF

Classification of Heart Disease Using K-Nearest Neighbor Imputation (K-최근접 이웃 알고리즘을 활용한 심장병 진단 및 예측)

  • Park, Pyoung-Woo;Lee, Seok-Won
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.742-745
    • /
    • 2017
  • 본 논문은 심장질환 도메인에 데이터 마이닝 기법을 적용한 연구로, 기존 환자의 정보에 대하여 K-최근접 이웃 알고리즘을 통해 결측 값을 대체하고, 대표적인 예측 분류기인 나이브 베이지안, 소포트 벡터 머신, 그리고 다층 퍼셉트론을 적용하여 각각 결과를 비교 및 분석한다. 본 연구의 실험은 K 최적화 과정을 포함하고 10-겹 교차 검증 방식으로 수행되었으며, 비교 및 분석은 정확도와 카파 통계치를 통해 판별한다.

An Empirical Comparison of Bagging, Boosting and Support Vector Machine Classifiers in Data Mining (데이터 마이닝에서 배깅, 부스팅, SVM 분류 알고리즘 비교 분석)

  • Lee Yung-Seop;Oh Hyun-Joung;Kim Mee-Kyung
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.343-354
    • /
    • 2005
  • The goal of this paper is to compare classification performances and to find a better classifier based on the characteristics of data. The compared methods are CART with two ensemble algorithms, bagging or boosting and SVM. In the empirical study of twenty-eight data sets, we found that SVM has smaller error rate than the other methods in most of data sets. When comparing bagging, boosting and SVM based on the characteristics of data, SVM algorithm is suitable to the data with small numbers of observation and no missing values. On the other hand, boosting algorithm is suitable to the data with number of observation and bagging algorithm is suitable to the data with missing values.