• Title/Summary/Keyword: MachineLearning

Search Result 5,657, Processing Time 0.036 seconds

Research on Outlier and Missing Value Correction Methods to Improve Smart Farm Data Quality (스마트팜 데이터 품질 향상을 위한 이상치 및 결측치 보정 방법에 관한 연구)

  • Sung-Jae Lee;Hyun Sim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.5
    • /
    • pp.1027-1034
    • /
    • 2024
  • This study aims to address the issues of outliers and missing values in AI-based smart farming to improve data quality and enhance the accuracy of agricultural predictive activities. By utilizing real data provided by the Rural Development Administration (RDA) and the Korea Agency of Education, Promotion, and Information Service in Food, Agriculture, Forestry, and Fisheries (EPIS), outlier detection and missing value imputation techniques were applied to collect and manage high-quality data. For successful smart farm operations, an IoT-based AI automatic growth measurement model is essential, and achieving a high data quality index through stable data preprocessing is crucial. In this study, various methods for correcting outliers and imputing missing values in growth data were applied, and the proposed preprocessing strategies were validated using machine learning performance evaluation indices. The results showed significant improvements in model performance, with high predictive accuracy observed in key evaluation metrics such as ROC and AUC.

The optimal method for imputing missing data in the preprocessing phase to enhance the performance of a DNN-based construction period prediction model

  • Haneul LEE;Yeongchae YUN;Youkyung KIM;Seokheon YUN
    • International conference on construction engineering and project management
    • /
    • 2024.07a
    • /
    • pp.271-276
    • /
    • 2024
  • The success of construction projects is influenced by various factors, with accurate management and prediction of the construction period playing a crucial role. The construction period is determined through contracts between the client and the contractor, and it is considered a key element in the management of construction projects, alongside cost management. To ensure the successful completion of projects, accurate prediction of the construction period is essential, as it aids in the efficient allocation of time and resources. The main objective of this study is to maximize the performance of construction period prediction models by applying and comparing various methods for handling missing data. Optimizing the model's performance requires accuracy and completeness of data, with the process of outlier removal and missing data imputation potentially having a significant impact on the model's predictive capability. During this process, the effect of changes in the dataset on model performance will be closely examined to identify the most effective method for handling missing data. Outlier removal and missing data imputation are crucial steps in the data preprocessing phase, and they can significantly improve the model's accuracy and reliability. This research aims to apply these data preprocessing methods and analyze their outcomes to find the most effective missing data imputation method for construction period prediction. After the selection process, considering the model's performance and stability, the mode imputation method was identified as the most suitable for predicting the construction period. The findings of this research are expected to contribute not only to improving the accuracy of construction period predictions but also to enhancing the overall efficiency and success rate of construction project management.

A Bibliometric Analysis of Global Research Trends in Digital Therapeutics (디지털 치료기기의 글로벌 연구 동향에 대한 계량서지학적 분석)

  • Dae Jin Kim;Hyeon Su Kim;Byung Gwan Kim;Ki Chang Nam
    • Journal of Biomedical Engineering Research
    • /
    • v.45 no.4
    • /
    • pp.162-172
    • /
    • 2024
  • To analyse the overall research trends in digital therapeutics, this study conducted a quantitative bibliometric analysis of articles published in the last 10 years from 2014 to 2023. We extracted bibliographic information of studies related to digital therapeutics from the Web of Science (WOS) database and performed publication status, citation analysis and keyword analysis using R (version 4.3.1) and VOSviewer (version 1.6.18) software. A total of 1,114 articles were included in the study, and the annual publication growth rate for digital therapeutics was 66.1%, a very rapid increase. "health" is the most used keyword based on Keyword Plus, and "cognitive-behavioral therapy", "depression", "healthcare", "mental-health", "meta-analysis" and "randomized controlled-trial" are the research keywords that have driven the development and impact of digital therapeutic devices over the long term. A total of five clusters were observed in the co-occurrence network analysis, with new research keywords such as "artificial intelligence", "machine learning" and "regulation" being observed in recent years. In our analysis of research trends in digital therapeutics, keywords related to mental health, such as depression, anxiety, and disorder, were the top keywords by occurrences and total link strength. While many studies have shown the positive effects of digital therapeutics, low engagement and high dropout rates remain a concern, and much research is being done to evaluate and improve them. Future studies should expand the search terms to ensure the representativeness of the results.

Privacy-Preserving Cryptographic API Misuse Detection Framework Using Homomorphic Encryption (동형 암호를 활용한 프라이버시 보장 암호화 API 오용 탐지 프레임워크)

  • Seungho Kim;Hyoungshick Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.5
    • /
    • pp.865-873
    • /
    • 2024
  • In this study, we propose a privacy-preserving cryptographic API misuse detection framework utilizing homomorphic encryption. The proposed framework is designed to effectively detect cryptographic API misuse while maintaining data confidentiality. We employ a Convolutional Neural Network (CNN)-based detection model and optimize its structure to ensure high accuracy even in an encrypted environment. Specifically, to enable efficient homomorphic operations, we leverage depth-wise convolutional layers and a cubic activation function to secure non-linearity, enabling effective misuse detection on encrypted data. Experimental results show that the proposed model achieved a high F1-score of 0.978, and the total execution time for the homomorphically encrypted model was 11.20 seconds, demonstrating near real-time processing efficiency. These findings confirm that the model offers excellent security and accuracy even when operating in a homomorphic encryption environment.

Development of System for Enhancing the Quality of Power Generation Facilities Failure History Data Based on Explainable AI (XAI) (XAI 기반 발전설비 고장 기록 데이터 품질 향상 시스템 개발)

  • Kim Yu Rim;Park Jeong In;Park Dong Hyun;Kang Sung Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.52 no.3
    • /
    • pp.479-493
    • /
    • 2024
  • Purpose: The deterioration in the quality of failure history data due to differences in interpretation of failures among workers at power plants and the lack of consistency in the way failures are recorded negatively impacts the efficient operation of power plants. The purpose of this study is to propose a system that classifies power generation facilities failures consistently based on the failure history text data created by the workers. Methods: This study utilizes data collected from three coal unloaders operated by Korea Midland Power Co., LTD, from 2012 to 2023. It classifies failures based on the results of Soft Voting, which incorporates the prediction probabilities derived from applying the predict_proba technique to four machine learning models: Random Forest, Logistic Regression, XGBoost, and SVM, along with scores obtained by constructing word dictionaries for each type of failure using LIME, one of the XAI (Explainable Artificial Intelligence) methods. Through this, failure classification system is proposed to improve the quality of power generation facilities failure history data. Results: The results of this study are as follows. When the power generation facilities failure classification system was applied to the failure history data of Continuous Ship Unloader, XGBoost showed the best performance with a Macro_F1 Score of 93%. When the system proposed in this study was applied, there was an increase of up to 0.17 in the Macro_F1 Score for Logistic Regression compared to when the model was applied alone. All four models used in this study, when the system was applied, showed equal or higher values in Accuracy and Macro_F1 Score than the single model alone. Conclusion: This study propose a failure classification system for power generation facilities to improve the quality of failure history data. This will contribute to cost reduction and stability of power generation facilities, as well as further improvement of power plant operation efficiency and stability.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.

Validation of nutrient intake of smartphone application through comparison of photographs before and after meals (식사 전후의 사진 비교를 통한 스마트폰 앱의 영양소섭취량 타당도 평가)

  • Lee, Hyejin;Kim, Eunbin;Kim, Su Hyeon;Lim, Haeun;Park, Yeong Mi;Kang, Joon Ho;Kim, Heewon;Kim, Jinho;Park, Woong-Yang;Park, Seongjin;Kim, Jinki;Yang, Yoon Jung
    • Journal of Nutrition and Health
    • /
    • v.53 no.3
    • /
    • pp.319-328
    • /
    • 2020
  • Purpose: This study was conducted to evaluate the validity of the Gene-Health application in terms of estimating energy and macronutrients. Methods: The subjects were 98 health adults participating in a weight-control intervention study. They recorded their diets in the Gene-Health application, took photographs before and after every meal on the same day, and uploaded them to the Gene-Health application. The amounts of foods and drinks consumed were estimated based on the photographs by trained experts, and the nutrient intakes were calculated using the CAN-Pro 5.0 program, which was named 'Photo Estimation'. The energy and macronutrients estimated from the Gene-Health application were compared with those from a Photo Estimation. The mean differences in energy and macronutrient intakes between the two methods were compared using paired t-test. Results: The mean energy intakes of Gene-Health and Photo Estimation were 1,937.0 kcal and 1,928.3 kcal, respectively. There were no significant differences in intakes of energy, carbohydrate, fat, and energy from fat (%) between two methods. The protein intake and energy from protein (%) of the Gene-Health were higher than those from the Photo Estimation. The energy from carbohydrate (%) for the Photo Estimation was higher than that of the Gene-Health. The Pearson correlation coefficients, weighted Kappa coefficients, and adjacent agreements for energy and macronutrient intakes between the two methods ranged from 0.382 to 0.607, 0.588 to 0.649, and 79.6% to 86.7%, respectively. Conclusion: The Gene-Health application shows acceptable validity as a dietary intake assessment tool for energy and macronutrients. Further studies with female subjects and various age groups will be needed.

Thermal Characteristics of Daegu using Land Cover Data and Satellite-derived Surface Temperature Downscaled Based on Machine Learning (기계학습 기반 상세화를 통한 위성 지표면온도와 환경부 토지피복도를 이용한 열환경 분석: 대구광역시를 중심으로)

  • Yoo, Cheolhee;Im, Jungho;Park, Seonyoung;Cho, Dongjin
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.6_2
    • /
    • pp.1101-1118
    • /
    • 2017
  • Temperatures in urban areas are steadily rising due to rapid urbanization and on-going climate change. Since the spatial distribution of heat in a city varies by region, it is crucial to investigate detailed thermal characteristics of urban areas. Recently, many studies have been conducted to identify thermal characteristics of urban areas using satellite data. However,satellite data are not sufficient for precise analysis due to the trade-off of temporal and spatial resolutions.In this study, in order to examine the thermal characteristics of Daegu Metropolitan City during the summers between 2012 and 2016, Moderate Resolution Imaging Spectroradiometer (MODIS) daytime and nighttime land surface temperature (LST) data at 1 km spatial resolution were downscaled to a spatial resolution of 250 m using a machine learning method called random forest. Compared to the original 1 km LST, the downscaled 250 m LST showed a higher correlation between the proportion of impervious areas and mean land surface temperatures in Daegu by the administrative neighborhood unit. Hot spot analysis was then conducted using downscaled daytime and nighttime 250 m LST. The clustered hot spot areas for daytime and nighttime were compared and examined based on the land cover data provided by the Ministry of Environment. The high-value hot spots were relatively more clustered in industrial and commercial areas during the daytime and in residential areas at night. The thermal characterization of urban areas using the method proposed in this study is expected to contribute to the establishment of city and national security policies.

Building battery deterioration prediction model using real field data (머신러닝 기법을 이용한 납축전지 열화 예측 모델 개발)

  • Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.243-264
    • /
    • 2018
  • Although the worldwide battery market is recently spurring the development of lithium secondary battery, lead acid batteries (rechargeable batteries) which have good-performance and can be reused are consumed in a wide range of industry fields. However, lead-acid batteries have a serious problem in that deterioration of a battery makes progress quickly in the presence of that degradation of only one cell among several cells which is packed in a battery begins. To overcome this problem, previous researches have attempted to identify the mechanism of deterioration of a battery in many ways. However, most of previous researches have used data obtained in a laboratory to analyze the mechanism of deterioration of a battery but not used data obtained in a real world. The usage of real data can increase the feasibility and the applicability of the findings of a research. Therefore, this study aims to develop a model which predicts the battery deterioration using data obtained in real world. To this end, we collected data which presents change of battery state by attaching sensors enabling to monitor the battery condition in real time to dozens of golf carts operated in the real golf field. As a result, total 16,883 samples were obtained. And then, we developed a model which predicts a precursor phenomenon representing deterioration of a battery by analyzing the data collected from the sensors using machine learning techniques. As initial independent variables, we used 1) inbound time of a cart, 2) outbound time of a cart, 3) duration(from outbound time to charge time), 4) charge amount, 5) used amount, 6) charge efficiency, 7) lowest temperature of battery cell 1 to 6, 8) lowest voltage of battery cell 1 to 6, 9) highest voltage of battery cell 1 to 6, 10) voltage of battery cell 1 to 6 at the beginning of operation, 11) voltage of battery cell 1 to 6 at the end of charge, 12) used amount of battery cell 1 to 6 during operation, 13) used amount of battery during operation(Max-Min), 14) duration of battery use, and 15) highest current during operation. Since the values of the independent variables, lowest temperature of battery cell 1 to 6, lowest voltage of battery cell 1 to 6, highest voltage of battery cell 1 to 6, voltage of battery cell 1 to 6 at the beginning of operation, voltage of battery cell 1 to 6 at the end of charge, and used amount of battery cell 1 to 6 during operation are similar to that of each battery cell, we conducted principal component analysis using verimax orthogonal rotation in order to mitigate the multiple collinearity problem. According to the results, we made new variables by averaging the values of independent variables clustered together, and used them as final independent variables instead of origin variables, thereby reducing the dimension. We used decision tree, logistic regression, Bayesian network as algorithms for building prediction models. And also, we built prediction models using the bagging of each of them, the boosting of each of them, and RandomForest. Experimental results show that the prediction model using the bagging of decision tree yields the best accuracy of 89.3923%. This study has some limitations in that the additional variables which affect the deterioration of battery such as weather (temperature, humidity) and driving habits, did not considered, therefore, we would like to consider the them in the future research. However, the battery deterioration prediction model proposed in the present study is expected to enable effective and efficient management of battery used in the real filed by dramatically and to reduce the cost caused by not detecting battery deterioration accordingly.

A Research in Applying Big Data and Artificial Intelligence on Defense Metadata using Multi Repository Meta-Data Management (MRMM) (국방 빅데이터/인공지능 활성화를 위한 다중메타데이터 저장소 관리시스템(MRMM) 기술 연구)

  • Shin, Philip Wootaek;Lee, Jinhee;Kim, Jeongwoo;Shin, Dongsun;Lee, Youngsang;Hwang, Seung Ho
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.169-178
    • /
    • 2020
  • The reductions of troops/human resources, and improvement in combat power have made Korean Department of Defense actively adapt 4th Industrial Revolution technology (Artificial Intelligence, Big Data). The defense information system has been developed in various ways according to the task and the uniqueness of each military. In order to take full advantage of the 4th Industrial Revolution technology, it is necessary to improve the closed defense datamanagement system.However, the establishment and usage of data standards in all information systems for the utilization of defense big data and artificial intelligence has limitations due to security issues, business characteristics of each military, anddifficulty in standardizing large-scale systems. Based on the interworking requirements of each system, data sharing is limited through direct linkage through interoperability agreement between systems. In order to implement smart defense using the 4th Industrial Revolution technology, it is urgent to prepare a system that can share defense data and make good use of it. To technically support the defense, it is critical to develop Multi Repository Meta-Data Management (MRMM) that supports systematic standard management of defense data that manages enterprise standard and standard mapping for each system and promotes data interoperability through linkage between standards which obeys the Defense Interoperability Management Development Guidelines. We introduced MRMM, and implemented by using vocabulary similarity using machine learning and statistical approach. Based on MRMM, We expect to simplify the standardization integration of all military databases using artificial intelligence and bigdata. This will lead to huge reduction of defense budget while increasing combat power for implementing smart defense.