Search | Korea Science

Adjustment System for Outlier and Missing Value using Data Storage (데이터 저장소를 이용한 이상치 및 결측치 보정 시스템)

Gwangho Kim;Neunghoe Kim
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.23 no.5
- /
- pp.47-53
- /
- 2023
With the advent of the 4th Industrial Revolution, diverse and a large amount of data has been accumulated now. The agricultural community has also collected environmental data that affects the growth of crops in smart farms or open fields with sensors. Environmental data has different features depending on where and when they are measured. Studies have been conducted using collected agricultural data to predict growth and yield with statistics and artificial intelligence. The results of these studies vary greatly depending on the data on which they are based. So, studies to enhance data quality have also been continuously conducted for performance improvement. A lot of data is required for high performance, but if there are outlier or missing values in the data, it can greatly affect the results even if the amount is sufficient. So, adjustment of outlier and missing values is essential in the data preprocessing. Therefore, this paper integrates data collected from actual farms and proposes a adjustment system for outlier and missing values based on it.
https://doi.org/10.7236/JIIBC.2023.23.5.47 인용 PDF HTML

Estimation using response probability when missing data happen on the second occasion

Park, Hyeonah;Na, Seongryong
- Journal of the Korean Data and Information Science Society
- /
- v.25 no.1
- /
- pp.263-269
- /
- 2014
When the loss of samples appears under repeated surveys, new samples can often replace missing values. Estimators using response probability can be considered under repeated surveys on two occasions where new samples are selected instead of missing data on the second occasion. We propose a new estimator that uses both respondents and new samples on the second occasion. It is considered for the simulation setting that missing values can happen at the second occasion and are replaced by new samples. We can see that the proposed estimator is more efficient than that using a weighting adjustment method for respondents at the second occasion.
https://doi.org/10.7465/jkdi.2014.25.1.263 인용 PDF KSCI

Travel Time Forecasting in an Interrupted Traffic Flow by adopting Historical Profile and Time-Space Data Fusion (히스토리컬 프로파일 구축과 시.공간 자료합성에 의한 단속류 통행시간 예측)

Yeo, Tae-Dong;Han, Gyeong-Su;Bae, Sang-Hun
- Journal of Korean Society of Transportation
- /
- v.27 no.2
- /
- pp.133-144
- /
- 2009
In Korea, the ITS project has been progressed to improve traffic mobility and safety. Further, it is to relieve traffic jam by supply real time travel information for drivers and to promote traffic convenience and safety. It is important that the traffic information is provided accurately. This study was conducted outlier elimination and missing data adjustment to improve accuracy of raw data. A method for raise reliability of travel time prediction information was presented. We developed Historical Profile model and adjustment formula to reflect quality of interrupted flow. We predicted travel time by developed Historical Profile model and adjustment formula and verified by comparison between developed model and existing model such as Neural Network model and Kalman Filter model. The results of comparative analysis clarified that developed model and Karlman Filter model similarity predicted in general situation but developed model was more accurate than other models in incident situation.
PDF KSCI

Modified BLS Weight Adjustment (수정된 BLS 가중치보정법)

Park, Jung-Joon;Cho, Ki-Jong;Lee, Sang-Eun;Shin, Key-Il
- Communications for Statistical Applications and Methods
- /
- v.18 no.3
- /
- pp.367-376
- /
- 2011
BLS weight adjustment is a widely used method for business surveys with non-responses and outliers. Recent surveys show that the non-response weight adjustment of the BLS method is the same as the ratio imputation method. In this paper, we suggested a modified BLS weight adjustment method by imputing missing values instead of using weight adjustment for non-response. Monthly labor survey data is used for a small Monte-Carlo simulation and we conclude that the suggested method is superior to the original BLS weight adjustment method.
https://doi.org/10.5351/CKSS.2011.18.3.367 인용 PDF KSCI

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

Hwang, S.Y.;Hahn, H.E.
- Journal of the Korean Data and Information Science Society
- /
- v.15 no.3
- /
- pp.555-563
- /
- 2004
In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.
PDF

Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods (가중치 보정을 이용한 다중대체법)

Kim, Jin-Young;Shin, Key-Il
- The Korean Journal of Applied Statistics
- /
- v.26 no.4
- /
- pp.635-647
- /
- 2013
Imputation is a commonly used method to handle missing survey data. The performance of the imputation method is influenced by various factors, especially an outlier. The removal of the outlier in a data set is a simple and effective approach to reduce the effect of an outlier. In this paper in order to improve the precision of multiple imputation, we study a imputation method which reduces the effect of outlier using various weight adjustment methods that include the removal of an outlier method. The regression method in PROC/MI in SAS is used for multiple imputation and the obtained final adjusted weight is used as a weight variable to obtain the imputed values. Simulation studies compared the performance of various weight adjustment methods and Monthly Labor Statistic data is used for real data analysis.
https://doi.org/10.5351/KJAS.2013.26.4.635 인용 PDF KSCI

The Phenomenological study on the Meaning of Family Adjustment Process Experience in Married Immigrant Women (결혼이민여성의 가족적응 과정에 관한 현상학적 연구)

Park, Byung-Kum
- The Journal of the Korea Contents Association
- /
- v.13 no.2
- /
- pp.277-295
- /
- 2013
The purpose of the phenomenological study was to explore the meaning of family adjustment process experience according to married immigrant women's perception and to enrich our understanding. In order to accomplish the purpose of research, six married immigrant women participated. Data were collected through in-depth interview. In addition, the data were analyzed by a Colaizzi's phenomenological analysis. The findings showed that the meaning of family adjustment process experience in married immigrant women were identified as 37 themes and 8categories. The 8 categories consisted of "deciding to marry a foreigner", "first meeting and marriage", "starting to living as a korean", "getting along with husband", "becoming a family with in-laws", "playing one's role as a mother", "missing hometown and family", "adjusting to living in Korea". Based on the findings, we discussed the meaning of family adjustment process experience in married immigrant women. And lastly, this results made suggestions for the social welfare policies and practices for them and their families.
https://doi.org/10.5392/JKCA.2013.13.02.277 인용 PDF KSCI

Handling the nonresponse in sample survey (설문조사에서의 무응답 처리)

Lee, Hwa-Jung;Kang, Suk-Bok
- Journal of the Korean Data and Information Science Society
- /
- v.23 no.6
- /
- pp.1183-1194
- /
- 2012
When it comes to a survey, no answer would occur frequently. Therefore various methods for handling nonresponse have been applied to analyse the survey. In this paper, the ratio of occurrence of two type of nonresponse cases - unit nonresponse and item nonresponse - is presented using previous real survey data, and we compared complete data and data with nonresponse. We suggest the reason of happening of nonresponse and the ratio of nonresponse using data collected through group interviews.
https://doi.org/10.7465/jkdi.2012.23.6.1183 인용 PDF KSCI

Evaluation of the Validity of Risk-Adjustment Model of Acute Stroke Mortality for Comparing Hospital Performance (병원 성과 비교를 위한 급성기 뇌졸중 사망률 위험보정모형의 타당도 평가)

Choi, Eun Young;Kim, Seon-Ha;Ock, Minsu;Lee, Hyeon-Jeong;Son, Woo-Seung;Jo, Min-Woo;Lee, Sang-il
- Health Policy and Management
- /
- v.26 no.4
- /
- pp.359-372
- /
- 2016
Background: The purpose of this study was to develop risk-adjustment models for acute stroke mortality that were based on data from Health Insurance Review and Assessment Service (HIRA) dataset and to evaluate the validity of these models for comparing hospital performance. Methods: We identified prognostic factors of acute stroke mortality through literature review. On the basis of the avaliable data, the following factors was included in risk adjustment models: age, sex, stroke subtype, stroke severity, and comorbid conditions. Survey data in 2014 was used for development and 2012 dataset was analysed for validation. Prediction models of acute stroke mortality by stroke type were developed using logistic regression. Model performance was evaluated using C-statistics, $R^2$ values, and Hosmer-Lemeshow goodness-of-fit statistics. Results: We excluded some of the clinical factors such as mental status, vital sign, and lab finding from risk adjustment model because there is no avaliable data. The ischemic stroke model with age, sex, and stroke severity (categorical) showed good performance (C-statistic=0.881, Hosmer-Lemeshow test p=0.371). The hemorrhagic stroke model with age, sex, stroke subtype, and stroke severity (categorical) also showed good performance (C-statistic=0.867, Hosmer-Lemeshow test p=0.850). Conclusion: Among risk adjustment models we recommend the model including age, sex, stroke severity, and stroke subtype for HIRA assessment. However, this model may be inappropriate for comparing hospital performance due to several methodological weaknesses such as lack of clinical information, variations across hospitals in the coding of comorbidities, inability to discriminate between comorbidity and complication, missing of stroke severity, and small case number of hospitals. Therefore, further studies are needed to enhance the validity of the risk adjustment model of acute stroke mortality.
https://doi.org/10.4332/KJHPA.2016.26.4.359 인용 PDF KSCI

A Network Partition Approach for MFD-Based Urban Transportation Network Model

Xu, Haitao;Zhang, Weiguo;zhuo, Zuozhang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.11
- /
- pp.4483-4501
- /
- 2020
Recent findings identified the scatter and shape of MFD (macroscopic fundamental diagram) is heavily influenced by the spatial distribution of link density in a road network. This implies that the concept of MFD can be utilized to divide a heterogeneous road network with different degrees of congestion into multiple homogeneous subnetworks. Considering the actual traffic data is usually incomplete and inaccurate while most traffic partition algorithms rely on the completeness of the data, we proposed a three-step partitioned algorithm called Iso-MB (Isoperimetric algorithm - Merging - Boundary adjustment) permitting of incompletely input data in this paper. The proposed algorithm was implemented and verified in a simulated urban transportation network. The existence of well-defined MFD in each subnetwork was revealed and discussed and the selection of stop parameter in the isoperimetric algorithm was explained and dissected. The effectiveness of the approach to the missing input data was also demonstrated and elaborated.
https://doi.org/10.3837/tiis.2020.11.013 인용 PDF KSCI HTML

Search Result 18, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)