• Title/Summary/Keyword: Data Collection and Preprocessing

Search Result 59, Processing Time 0.023 seconds

An Effective Smart Greenhouse Data Preprocessing System for Autonomous Machine Learning (자율 기계 학습을 위한 효과적인 스마트 온실 데이터 전처리 시스템)

  • Jongtae Lim;RETITI DIOP EMANE Christopher;Yuna Kim;Jeonghyun Baek;Jaesoo Yoo
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.47-53
    • /
    • 2023
  • Recently, research on a smart farm that creates new values by combining information and communication technology(ICT) with agriculture has been actively done. In order for domestic smart farm technology to have productivity at the same level of advanced agricultural countries, automated decision-making using machine learning is necessary. However, current smart greenhouse data collection technologies in our country are not enough to perform big data analysis or machine learning. In this paper, we design and implement a smart greenhouse data preprocessing system for autonomous machine learning. The proposed system applies target data to various preprocessing techniques. And the proposed system evaluate the performance of each preprocessing technique and store optimal preprocessing technique for each data. Stored optimal preprocessing techniques are used to perform preprocessing on newly collected data

Development of an intelligent IIoT platform for stable data collection (안정적 데이터 수집을 위한 지능형 IIoT 플랫폼 개발)

  • Woojin Cho;Hyungah Lee;Dongju Kim;Jae-hoi Gu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.4
    • /
    • pp.687-692
    • /
    • 2024
  • The energy crisis is emerging as a serious problem around the world. In the case of Korea, there is great interest in energy efficiency research related to industrial complexes, which use more than 53% of total energy and account for more than 45% of greenhouse gas emissions in Korea. One of the studies is a study on saving energy through sharing facilities between factories using the same utility in an industrial complex called a virtual energy network plant and through transactions between energy producing and demand factories. In such energy-saving research, data collection is very important because there are various uses for data, such as analysis and prediction. However, existing systems had several shortcomings in reliably collecting time series data. In this study, we propose an intelligent IIoT platform to improve it. The intelligent IIoT platform includes a preprocessing system to identify abnormal data and process it in a timely manner, classifies abnormal and missing data, and presents interpolation techniques to maintain stable time series data. Additionally, time series data collection is streamlined through database optimization. This paper contributes to increasing data usability in the industrial environment through stable data collection and rapid problem response, and contributes to reducing the burden of data collection and optimizing monitoring load by introducing a variety of chatbot notification systems.

Identifying research trends in the emergency medical technician field using topic modeling (토픽모델링을 활용한 응급구조사 관련 연구동향)

  • Lee, Jung Eun;Kim, Moo-Hyun
    • The Korean Journal of Emergency Medical Services
    • /
    • v.26 no.2
    • /
    • pp.19-35
    • /
    • 2022
  • Purpose: This study aimed to identify research topics in the emergency medical technician (EMT) field and examine research trends. Methods: In this study, 261 research papers published between January 2000 and May 2022 were collected, and EMT research topics and trends were analyzed using topic modeling techniques. This study used a text mining technique and was conducted using data collection flow, keyword preprocessing, and analysis. Keyword preprocessing and data analysis were done with the RStudio Version 4.0.0 program. Results: Keywords were derived through topic modeling analysis, and eight topics were ultimately identified: patient treatment, various roles, the performance of duties, cardiopulmonary resuscitation, triage systems, job stress, disaster management, and education programs. Conclusion: Based on the research results, it is believed that a study on the development and application of education programs that can successfully increase the emergency care capabilities of EMTs is needed.

Image Classification Model using web crawling and transfer learning (웹 크롤링과 전이학습을 활용한 이미지 분류 모델)

  • Lee, JuHyeok;Kim, Mi Hui
    • Journal of IKEEE
    • /
    • v.26 no.4
    • /
    • pp.639-646
    • /
    • 2022
  • In this paper, to solve the large dataset problem, we collect images through an image collection method called web crawling and build datasets for use in image classification models through a data preprocessing process. We also propose a lightweight model that can automatically classify images by adding category values by incorporating transfer learning into the image classification model and an image classification model that reduces training time and achieves high accuracy.

Changes in Measuring Methods of Walking Behavior and the Potentials of Mobile Big Data in Recent Walkability Researches (보행행태조사방법론의 변화와 모바일 빅데이터의 가능성 진단 연구 - 보행환경 분석연구 최근 사례를 중심으로 -)

  • Kim, Hyunju;Park, So-Hyun;Lee, Sunjae
    • Journal of the Architectural Institute of Korea Planning & Design
    • /
    • v.35 no.1
    • /
    • pp.19-28
    • /
    • 2019
  • The purpose of this study is to evaluate the walking behavior analysis methodology used in the previous studies, paying attention to the demand for empirical data collecting for urban and neighborhood planning. The preceding researches are divided into (1)Recording, (2) Surveys, (3)Statistical data, (4)Global positioning system (GPS) devices, and (5)Mobile Big Data analysis. Next, we analyze the precedent research and identify the changes of the walkability research. (1)being required empirical data on the actual walking and moving patterns of people, (2)beginning to be measured micro-walking behaviors such as actual route, walking facilities, detour, walking area. In addition, according to the trend of research, it is analyzed that the use of GPS device and the mobile big data are newly emerged. Finally, we analyze pedestrian data based on mobile big data in terms of 'application' and distinguishing it from existing survey methodology. We present the possibility of mobile big data. (1)Improvement of human, temporal and spatial constraints of data collection, (2)Improvement of inaccuracy of collected data, (3)Improvement of subjective intervention in data collection and preprocessing, (4)Expandability of walking environment research.

Designing Bigdata Platform for Multi-Source Maritime Information

  • Junsang Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.1
    • /
    • pp.111-119
    • /
    • 2024
  • In this paper, we propose a big data platform that can collect information from various sources collected at ocean. Currently operating ocean-related big data platforms are focused on storing and sharing created data, and each data provider is responsible for data collection and preprocessing. There are high costs and inefficiencies in collecting and integrating data in a marine environment using communication networks that are poor compared to those on land, making it difficult to implement related infrastructure. In particular, in fields that require real-time data collection and analysis, such as weather information, radar and sensor data, a number of issues must be considered compared to land-based systems, such as data security, characteristics of organizations and ships, and data collection costs, in addition to communication network issues. First, this paper defines these problems and presents solutions. In order to design a big data platform that reflects this, we first propose a data source, hierarchical MEC, and data flow structure, and then present an overall platform structure that integrates them all.

Generation of Time-Series Data for Multisource Satellite Imagery through Automated Satellite Image Collection (자동 위성영상 수집을 통한 다종 위성영상의 시계열 데이터 생성)

  • Yunji Nam;Sungwoo Jung;Taejung Kim;Sooahm Rhee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_4
    • /
    • pp.1085-1095
    • /
    • 2023
  • Time-series data generated from satellite data are crucial resources for change detection and monitoring across various fields. Existing research in time-series data generation primarily relies on single-image analysis to maintain data uniformity, with ongoing efforts to enhance spatial and temporal resolutions by utilizing diverse image sources. Despite the emphasized significance of time-series data, there is a notable absence of automated data collection and preprocessing for research purposes. In this paper, to address this limitation, we propose a system that automates the collection of satellite information in user-specified areas to generate time-series data. This research aims to collect data from various satellite sources in a specific region and convert them into time-series data, developing an automatic satellite image collection system for this purpose. By utilizing this system, users can collect and extract data for their specific regions of interest, making the data immediately usable. Experimental results have shown the feasibility of automatically acquiring freely available Landsat and Sentinel images from the web and incorporating manually inputted high-resolution satellite images. Comparisons between automatically collected and edited images based on high-resolution satellite data demonstrated minimal discrepancies, with no significant errors in the generated output.

A Development of Preprocessing Models of Toll Collection System Data for Travel Time Estimation (통행시간 추정을 위한 TCS 데이터의 전처리 모형 개발)

  • Lee, Hyun-Seok;NamKoong, Seong J.
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.8 no.5
    • /
    • pp.1-11
    • /
    • 2009
  • TCS Data imply characteristics of traffic conditions. However, there are outliers in TCS data, which can not represent the travel time of the pertinent section, if these outliers are not eliminated, travel time may be distorted owing to these outliers. Various travel time can be distributed under the same section and time because the variation of the travel time is increase as the section distance is increase, which make difficult to calculate the representative of travel time. Accordingly, it is important to grasp travel time characteristics in order to compute the representative of travel time using TCS Data. In this study, after analyzing the variation ratio of the travel time according to the link distance and the level of congestion, the outlier elimination model and the smoothing model for TCS data were proposed. The results show that the proposed model can be utilized for estimating a reliable travel time for a long-distance path in which there are a variation of travel times from the same departure time, the intervals are large and the change in the representative travel time is irregular for a short period.

  • PDF

Research on artificial intelligence based battery analysis and evaluation methods using electric vehicle operation data (전기 차 운행 데이터를 활용한 인공지능 기반의 배터리 분석 및 평가 방법 연구)

  • SeungMo Hong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.6
    • /
    • pp.385-391
    • /
    • 2023
  • As the use of electric vehicles has increased to minimize carbon emissions, the analyzing the state and performance of lithium-ion batteries that is instrumental in electric vehicles have been important. Comprehensive analysis using not only the voltage, current and temperature of the battery pack, which can affect the condition and performance of the battery, but also the driving data and charging pattern data of the electric vehicle is required. Therefore, a thorough analysis is imperative, utilizing electric vehicle operation data, charging pattern data, as well as battery pack voltage, current, and temperature data, which collectively influence the condition and performance of the battery. Therefore, collection and preprocessing of battery data collected from electric vehicles, collection and preprocessing of data on driver driving habits in addition to simple battery data, detailed design and modification of artificial intelligence algorithm based on the analyzed influencing factors, and A battery analysis and evaluation model was designed. In this paper, we gathered operational data and battery data from real-time electric buses. These data sets were then utilized to train a Random Forest algorithm. Furthermore, a comprehensive assessment of battery status, operation, and charging patterns was conducted using the explainable Artificial Intelligence (XAI) algorithm. The study identified crucial influencing factors on battery status, including rapid acceleration, rapid deceleration, sudden stops in driving patterns, the number of drives per day in the charging and discharging pattern, daily accumulated Depth of Discharge (DOD), cell voltage differences during discharge, maximum cell temperature, and minimum cell temperature. These factors were confirmed to significantly impact the battery condition. Based on the identified influencing factors, a battery analysis and evaluation model was designed and assessed using the Random Forest algorithm. The results contribute to the understanding of battery health and lay the foundation for effective battery management in electric vehicles.

A study on Digital Agriculture Data Curation Service Plan for Digital Agriculture

  • Lee, Hyunjo;Cho, Han-Jin;Chae, Cheol-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.2
    • /
    • pp.171-177
    • /
    • 2022
  • In this paper, we propose a service method that can provide insight into multi-source agricultural data, way to cluster environmental factor which supports data analysis according to time flow, and curate crop environmental factors. The proposed curation service consists of four steps: collection, preprocessing, storage, and analysis. First, in the collection step, the service system collects and organizes multi-source agricultural data by using an OpenAPI-based web crawler. Second, in the preprocessing step, the system performs data smoothing to reduce the data measurement errors. Here, we adopt the smoothing method for each type of facility in consideration of the error rate according to facility characteristics such as greenhouses and open fields. Third, in the storage step, an agricultural data integration schema and Hadoop HDFS-based storage structure are proposed for large-scale agricultural data. Finally, in the analysis step, the service system performs DTW-based time series classification in consideration of the characteristics of agricultural digital data. Through the DTW-based classification, the accuracy of prediction results is improved by reflecting the characteristics of time series data without any loss. As a future work, we plan to implement the proposed service method and apply it to the smart farm greenhouse for testing and verification.