• Title/Summary/Keyword: 결측

Search Result 428, Processing Time 0.021 seconds

Regression models for interval-censored semi-competing risks data with missing intermediate transition status (중간 사건이 결측되었거나 구간 중도절단된 준 경쟁 위험 자료에 대한 회귀모형)

  • Kim, Jinheum;Kim, Jayoun
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.7
    • /
    • pp.1311-1327
    • /
    • 2016
  • We propose a multi-state model for analyzing semi-competing risks data with interval-censored or missing intermediate events. This model is an extension of the 'illness-death model', which composes three states, such as 'healthy', 'diseased', and 'dead'. The state of 'diseased' can be considered as an intermediate event. Two more states are added into the illness-death model to describe missing events caused by a loss of follow-up before the end of the study. One of them is a state of 'LTF', representing a lost-to-follow-up, and the other is an unobservable state that represents the intermediate event experienced after LTF occurred. Given covariates, we employ the Cox proportional hazards model with a normal frailty and construct a full likelihood to estimate transition intensities between states in the multi-state model. Marginalization of the full likelihood is completed using the adaptive Gaussian quadrature, and the optimal solution of the regression parameters is achieved through the iterative Newton-Raphson algorithm. Simulation studies are carried out to investigate the finite-sample performance of the proposed estimation procedure in terms of the empirical coverage probability of the true regression parameter. Our proposed method is also illustrated with the dataset adapted from Helmer et al. (2001).

A Model for Groundwater Time-series from the Well Field of Riverbank Filtration (강변여과 취수정 주변 지하수위를 위한 시계열 모형)

  • Lee, Sang-Il;Lee, Sang-Ki;Hamm, Se-Yeong
    • Journal of Korea Water Resources Association
    • /
    • v.42 no.8
    • /
    • pp.673-680
    • /
    • 2009
  • Alternatives to conventional water resources are being sought due to the scarcity and the poor quality of surface water. Riverbank filtration (RBF) is one of them and considered as a promising source of water supply in some cities. Changwon City has started RBF in 2001 and field data have been accumulated. This study is to develop a time-series model for groundwater level data collected from the pumping area of RBF. The site is Daesan-myeon, Changwon City, where groundwater level data have been measured for the last five years (Jan. 2003$\sim$Dec. 2007). Minute-based groundwater levels was averaged out to monthly data to see the long-term behavior. Time-series analysis was conducted according to the Box-Jenkins method. The resulted model turned out to be a seasonal ARIMA model, and its forecasting performance was satisfactory. We believe this study will provide a prototype for other riverbank filtration sites where the predictability of groundwater level is essential for the reliable supply of water.

Traffic Safety Countermeasures According to the Accident Area Patterns and Impact Factor Analysis of the Large-scale Traffic Accident Locations (대형 교통사고 발생지점 유형화와 영향요인 분석에 따른 교통안전대책 방안에 관한 연구)

  • Kim, Bong-Gi;Jeong, Heon-Yeong;Go, Sang-Seon
    • Journal of Korean Society of Transportation
    • /
    • v.24 no.1 s.87
    • /
    • pp.39-52
    • /
    • 2006
  • This study divided the large-scale traffic accident locations into its own characteristics by using Cluster Analysis. Also, Quantification II and Classification and Regression Tree methods were used enabling evaluation for the amount of affecting influence by the crash type. After these analyses, we tested the fitness of the results and suggested the simplification of the quantification index. With the results from the discussed procedure, obvious differences were observed by groups according to the characteristics of crash type from the Discrimination and Classification analysis of divided four groups. Thus, measures and supplementary measures for the traffic accidents could be suggested in groups systematically. However, a lot of missing values in variables caused a huge loss of data and made this study difficult for more detailed analysis, With this difficulty. recording mandatory log files with a standardized format is also recommended to Prevent this Problem in advance.

The Verification of Application of Distributed Runoff Model According to Estimation Methods for the Missing Rainfall Data (결측강우보완방법에 따른 분포형 유출모형의 적용성 검증)

  • Choi, Yong-Joon;Kim, Yeon-Su;Lee, Gi-Ha;Kim, Joo-Cheol
    • Journal of Environmental Science International
    • /
    • v.19 no.12
    • /
    • pp.1375-1384
    • /
    • 2010
  • The purpose of this research is to understand the change of runoff characteristics by estimated spatial rainfall. Therefore, this paper largely composed of two parts. First, we compared the simulated result according to estimation method, ID(Inverse Distance Method, ID2(Inverse Square Distance Method), and Kr(General Covariance Kriging Method), after letting miss rainfall data to the observed data. Second, we reviewed the runoff characteristics of the distributed runoff model according to the estimated spatial rainfall. On the basis of Yuseong water level station, we select the target basin as Gabchun watershed. We assumed 1 point or 2 point of the 6 rainfall gauge stations in watershed were missed. We applied the spatial rainfall distributed by Kr to Hy-GIS GRM, distributed runoff model. When 1 point rainfall data is missed, Kr is superior to others in point rainfall estimation and runoff estimation of Hy-GIS GRM. However, in case rainfall data of 2 points is missed, all of three methods did not give suitable result for them. In conclusion, Kr showed better applicability than other estimated methods if rainfall's data less than 2 points is missed.

Growth of Civic Organizations in South Korea (한국 시민단체의 성장에 대한 양적 연구)

  • Shin, Dong-Joon;Kim, Kwang-Soo;Kim, Jae-On
    • Survey Research
    • /
    • v.6 no.2
    • /
    • pp.75-101
    • /
    • 2005
  • This study introduces and analyzes the data from Directory of Korean NGOs, which was published in 1997 and again in 200, to conduct a quantitative research on the growth of civic organization in South Korea. This paper focus on the information on membership size and founding year which are essential indicators for the growth of organizations. Missing rates on those two indicators are checked to evaluate the quality of data. We examine the changes in membership size between the two time periods, 1996 and 1999. It shows that there is a considerable decrease in the membership size for civic and advocary organizations that are oriented to national issues. It suggests the competition among the organizations over limited resources, which is consistent with an assumption of ecological theory of organization on non-linear growth pattern. Using founding year data from 1945 to 1996, we estimate pseudo growth curves of civic organizations based on logistic growth curve model to discuss different growth patterns of organizations across areas of activities.

  • PDF

Korean women wage analysis using selection models (표본 선택 모형을 이용한 국내 여성 임금 데이터 분석)

  • Jeong, Mi Ryang;Kim, Mijeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1077-1085
    • /
    • 2017
  • In this study, we have found the major factors which affect Korean women's wage analysing the data provided by 2015 Korea Labor Panel Survey (KLIPS). In general, wage data is difficult to analyze because random sampling is infeasible. Heckman sample selection model is the most widely used method for analysing the data with sample selection. Heckman proposed two kinds of selection models: the one is the model with maximum likelihood method and the other is the Heckman two stage model. Heckman two stage model is known to be robust to the normal assumption of bivariate error terms. Recently, Marchenko and Genton (2012) proposed the Heckman selectiont model which generalizes the Heckman two stage model and concluded that Heckman selection-t model is more robust to the error assumptions. Employing the two models, we carried out the analysis of the data and we compared those results.

Empirical Bayesian Misclassification Analysis on Categorical Data (범주형 자료에서 경험적 베이지안 오분류 분석)

  • 임한승;홍종선;서문섭
    • The Korean Journal of Applied Statistics
    • /
    • v.14 no.1
    • /
    • pp.39-57
    • /
    • 2001
  • Categorical data has sometimes misclassification errors. If this data will be analyzed, then estimated cell probabilities could be biased and the standard Pearson X2 tests may have inflated true type I error rates. On the other hand, if we regard wellclassified data with misclassified one, then we might spend lots of cost and time on adjustment of misclassification. It is a necessary and important step to ask whether categorical data is misclassified before analyzing data. In this paper, when data is misclassified at one of two variables for two-dimensional contingency table and marginal sums of a well-classified variable are fixed. We explore to partition marginal sums into each cells via the concepts of Bound and Collapse of Sebastiani and Ramoni (1997). The double sampling scheme (Tenenbein 1970) is used to obtain informations of misclassification. We propose test statistics in order to solve misclassification problems and examine behaviors of the statistics by simulation studies.

  • PDF

Evaluation of the Utility of SSG Algorithm for Image Restoration of Landsat-8 (Landsat 8호 영상 복원을 위한 SSG 기법 활용성 평가)

  • Lee, Mi Hee;Lee, Dalgeun;Yu, Jung Hum;Kim, Jinyoung
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_4
    • /
    • pp.1231-1244
    • /
    • 2020
  • Landsat satellites are representative optical satellites that have observed the Earth's surface for a long-term, and are suitable for long-term changes such as disaster preparedness/recovery monitoring, land use change, change detection, and time series monitoring. In this paper, clouds and cloud shadows were detected using QA bands to detect and remove clouds simply and efficiently. Then, the missing area of the experimantal image is restorated through the SSG algorithm, which does not directly refer to the pixel value of the reference image, but performs restoration to the pixel value in the Experimental image. Through this study, we presented the possibility of utilizing the modified SSG algorithm by quantitatively and qualitatively evaluating information on variousl and cover conditions in the thermal wavelength band as well as the visible wavelength band observing the surface.

Statistical Consideration on the Resources of the Countries in the World (세계 각국의 자원에 대한 통계적 고찰)

  • Huh, Moon-Yul;Choi, Byong-Su;Lee, Seung-Chun
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.1
    • /
    • pp.41-57
    • /
    • 2009
  • The paper investigates the resources of the 232 countries based on the 39 resources of these countries. The data used in this work is from various sources like UN, CIA, World bank, OECD reports and the home pages of each country. The purpose of the study is to evaluate what resources are most influential to the wealth of a country, to the well-bring of the country, or the status of the country's development. For this, data visualization method is applied. Data visualization technique, although powerful for exploratory purposes, is dependent upon the users expertize and the interpretation is also dependent on the of the users. For objective methods of investigation, mutual information based on the Shanon's entropy theory is applied here. All the statistical methods employed in this paper are processed with DAVIS (Huh and Song, 2002)

One-step spectral clustering of weighted variables on single-cell RNA-sequencing data (단세포 RNA 시퀀싱 데이터를 위한 가중변수 스펙트럼 군집화 기법)

  • Park, Min Young;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.511-526
    • /
    • 2020
  • Single-cell RNA-sequencing (scRNA-seq) data consists of each cell's RNA expression extracted from large populations of cells. One main purpose of using scRNA-seq data is to identify inter-cellular heterogeneity. However, scRNA-seq data pose statistical challenges when applying traditional clustering methods because they have many missing values and high level of noise due to technical and sampling issues. In this paper, motivated by analyzing scRNA-seq data, we propose a novel spectral-based clustering method by imposing different weights on genes when computing a similarity between cells. Assigning weights on genes and clustering cells are performed simultaneously in the proposed clustering framework. We solve the proposed non-convex optimization using an iterative algorithm. Both real data application and simulation study suggest that the proposed clustering method better identifies underlying clusters compared with existing clustering methods.