• Title/Summary/Keyword: Data

Search Result 215,318, Processing Time 0.112 seconds

Detection and Correction Method of Erroneous Data Using Quantile Pattern and LSTM

  • Hwang, Chulhyun;Kim, Hosung;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.16 no.4
    • /
    • pp.242-247
    • /
    • 2018
  • The data of K-Water waterworks is collected from various sensors and used as basic data for the operation and analysis of various devices. In this way, the importance of the sensor data is very high, but it contains misleading data due to the characteristics of the sensor in the external environment. However, the cleansing method for the missing data is concentrated on the prediction of the missing data, so the research on the detection and prediction method of the missing data is poor. This is a study to detect wrong data by converting collected data into quintiles and patterning them. It is confirmed that the accuracy of detecting false data intentionally generated from real data is higher than that of the conventional method in all cases. Future research we will prove the proposed system's efficiency and accuracy in various environments.

Automatic Algorithm for Cleaning Asset Data of Overhead Transmission Line (가공송전 전선 자산데이터의 정제 자동화 알고리즘 개발 연구)

  • Mun, Sung-Duk;Kim, Tae-Joon;Kim, Kang-Sik;Hwang, Jae-Sang
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.7 no.1
    • /
    • pp.73-77
    • /
    • 2021
  • As the big data analysis technologies has been developed worldwide, the importance of asset management for electric power facilities based data analysis is increasing. It is essential to secure quality of data that will determine the performance of the RISK evaluation algorithm for asset management. To improve reliability of asset management, asset data must be preprocessed. In particular, the process of cleaning dirty data is required, and it is also urgent to develop an algorithm to reduce time and improve accuracy for data treatment. In this paper, the result of the development of an automatic cleaning algorithm specialized in overhead transmission asset data is presented. A data cleaning algorithm was developed to enable data clean by analyzing quality and overall pattern of raw data.

Big Data Key Challenges

  • Alotaibi, Sultan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.340-350
    • /
    • 2022
  • The big data term refers to the great volume of data and complicated data structure with difficulties in collecting, storing, processing, and analyzing these data. Big data analytics refers to the operation of disclosing hidden patterns through big data. This information and data set cloud to be useful and provide advanced services. However, analyzing and processing this information could cause revealing and disclosing some sensitive and personal information when the information is contained in applications that are correlated to users such as location-based services, but concerns are diminished if the applications are correlated to general information such as scientific results. In this work, a survey has been done over security and privacy challenges and approaches in big data. The challenges included here are in each of the following areas: privacy, access control, encryption, and authentication in big data. Likewise, the approaches presented here are privacy-preserving approaches in big data, access control approaches in big data, encryption approaches in big data, and authentication approaches in big data.

Dual Image Reversible Data Hiding Scheme Based on Secret Sharing to Increase Secret Data Embedding Capacity (비밀자료 삽입용량을 증가시키기 위한 비밀 공유 기반의 이중 이미지 가역 정보은닉 기법)

  • Kim, Pyung Han;Ryu, Kwan-Woo
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.9
    • /
    • pp.1291-1306
    • /
    • 2022
  • The dual image-based reversible data hiding scheme embeds secret data into two images to increase the embedding capacity of secret data. The dual image-based reversible data hiding scheme can transmit a lot of secret data. Therefore, various schemes have been proposed until recently. In 2021, Chen and Hong proposed a dual image-based reversible data hiding scheme that embeds a large amount of secret data using a reference matrix, secret data, and bit values. However, in this paper, more secret data can be embedded than Chen and Hong's scheme. To achieve this goal, the proposed scheme generates polynomials and shared values using secret sharing scheme, and embeds secret data using reference matrix and septenary number, and random value. Experimental results show that the proposed scheme can transmit more secret data to the receiver while maintaining the image quality similar to other dual image-based reversible data hiding schemes.

A Study on Voice Communication over Data Communication Network (데이터 통신망에서 음성통신에 대한 연구)

  • 우홍체
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.11a
    • /
    • pp.471-475
    • /
    • 2000
  • Voice and data are transmitted over a single packetized data communications network which is designed for data communications. The public switched telephone network for voice and the packet data network for data are merging into a single data network to get efficiency and to reduce operational cost. However, integrating voice and data transmission over a single data network is not easy because voice should be transmitted without delay but data should be transmitted without error. Advances in technology begin to overcome basic differences. Several integration methods in voice and data will be examined and reviewed here. Moreover, trends and problems on integration will be also discussed.

  • PDF

Obtaining bootstrap data for the joint distribution of bivariate survival times

  • Kwon, Se-Hyug
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.5
    • /
    • pp.933-939
    • /
    • 2009
  • The bivariate data in clinical research fields often has two types of failure times, which are mark variable for the first failure time and the final failure time. This paper showed how to generate bootstrap data to get Bayesian estimation for the joint distribution of bivariate survival times. The observed data was generated by Frank's family and the fake date is simulated with the Gamma prior of survival time. The bootstrap data was obtained by combining the mimic data with the observed data and the simulated fake data from the observed data.

  • PDF

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

A Study on Policies to Revitalize the Public Big Data in Seoul (서울시 공공빅데이터 활성화 방안 연구)

  • Choi, Bong;Yun, Jongjin;Um, Taehyee
    • Knowledge Management Research
    • /
    • v.20 no.3
    • /
    • pp.73-89
    • /
    • 2019
  • The purpose of this study is to investigate the current state of public Big Data in Seoul and suggest policy directions for the revitalization of Seoul's public Big Data. Big Data is perceived as innovation resources under the era of 4th Industrial revolution and Data economy. Especially, public Big Data serves a significant role in terms of universal access for citizens, startup, and enterprise compared with the private sector. Seoul reorganized a substructure of government's focus on Big Data and established organizations such as Big Data Campus and Urban Data Science Lab. Although the number of public open Data has increased in Seoul, there exists not much Data with characteristics similar to Big Data, such as volume, velocity, and value. In order to present the direction of Big Data policy in Seoul, we investigate the current status of Big Data Campus and Urban Data Science Lab operated by Seoul City. Considering the results of this study, we have proposed several directions that Seoul can use in establishing big data related strategies.

A Big Data-Driven Business Data Analysis System: Applications of Artificial Intelligence Techniques in Problem Solving

  • Donggeun Kim;Sangjin Kim;Juyong Ko;Jai Woo Lee
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.35-47
    • /
    • 2023
  • It is crucial to develop effective and efficient big data analytics methods for problem-solving in the field of business in order to improve the performance of data analytics and reduce costs and risks in the analysis of customer data. In this study, a big data-driven data analysis system using artificial intelligence techniques is designed to increase the accuracy of big data analytics along with the rapid growth of the field of data science. We present a key direction for big data analysis systems through missing value imputation, outlier detection, feature extraction, utilization of explainable artificial intelligence techniques, and exploratory data analysis. Our objective is not only to develop big data analysis techniques with complex structures of business data but also to bridge the gap between the theoretical ideas in artificial intelligence methods and the analysis of real-world data in the field of business.

System Construction and Data Development of National Standard Reference for Renewable Energy - Model-Based Standard Meteorological Year (신재생에너지 국가참조표준 시스템 구축 및 개발 - 모델 기반 표준기상년)

  • Boyoung Kim;Chang Ki Kim;Chang-yeol Yun;Hyun-goo Kim;Yong-heack Kang
    • New & Renewable Energy
    • /
    • v.20 no.1
    • /
    • pp.95-101
    • /
    • 2024
  • Since 1990, the Renewable Big Data Research Lab at the Korea Institute of Energy Technology has been observing solar radiation at 16 sites across South Korea. Serving as the National Reference Standard Data Center for Renewable Energy since 2012, it produces essential data for the sector. By 2020, it standardized meteorological year data from 22 sites. Despite user demand for data from approximately 260 sites, equivalent to South Korea's municipalities, this need exceeds the capability of measurement-based data. In response, our team developed a method to derive solar radiation data from satellite images, covering South Korea in 400,000 grids of 500 m × 500 m each. Utilizing satellite-derived data and ERA5-Land reanalysis data from the European Centre for Medium-Range Weather Forecasts (ECMWF), we produced standard meteorological year data for 1,000 sites. Our research also focused on data measurement traceability and uncertainty estimation, ensuring the reliability of our model data and the traceability of existing measurement-based data.