• Title/Summary/Keyword: data-based model

Search Result 21,096, Processing Time 0.051 seconds

Improved Statistical Language Model for Context-sensitive Spelling Error Candidates (문맥의존 철자오류 후보 생성을 위한 통계적 언어모형 개선)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.2
    • /
    • pp.371-381
    • /
    • 2017
  • The performance of the statistical context-sensitive spelling error correction depends on the quality and quantity of the data for statistical language model. In general, the size and quality of data in a statistical language model are proportional. However, as the amount of data increases, the processing speed becomes slower and storage space also takes up a lot. We suggest the improved statistical language model to solve this problem. And we propose an effective spelling error candidate generation method based on a new statistical language model. The proposed statistical model and the correction method based on it improve the performance of the spelling error correction and processing speed.

Big Data Management in Structured Storage Based on Fintech Models for IoMT using Machine Learning Techniques (기계학습법을 이용한 IoMT 핀테크 모델을 기반으로 한 구조화 스토리지에서의 빅데이터 관리 연구)

  • Kim, Kyung-Sil
    • Advanced Industrial SCIence
    • /
    • v.1 no.1
    • /
    • pp.7-15
    • /
    • 2022
  • To adopt the development in the medical scenario IoT developed towards the advancement with the processing of a large amount of medical data defined as an Internet of Medical Things (IoMT). The vast range of collected medical data is stored in the cloud in the structured manner to process the collected healthcare data. However, it is difficult to handle the huge volume of the healthcare data so it is necessary to develop an appropriate scheme for the healthcare structured data. In this paper, a machine learning mode for processing the structured heath care data collected from the IoMT is suggested. To process the vast range of healthcare data, this paper proposed an MTGPLSTM model for the processing of the medical data. The proposed model integrates the linear regression model for the processing of healthcare information. With the developed model outlier model is implemented based on the FinTech model for the evaluation and prediction of the COVID-19 healthcare dataset collected from the IoMT. The proposed MTGPLSTM model comprises of the regression model to predict and evaluate the planning scheme for the prevention of the infection spreading. The developed model performance is evaluated based on the consideration of the different classifiers such as LR, SVR, RFR, LSTM and the proposed MTGPLSTM model and the different size of data as 1GB, 2GB and 3GB is mainly concerned. The comparative analysis expressed that the proposed MTGPLSTM model achieves ~4% reduced MAPE and RMSE value for the worldwide data; in case of china minimal MAPE value of 0.97 is achieved which is ~ 6% minimal than the existing classifier leads.

1D-CNN-LSTM Hybrid-Model-Based Pet Behavior Recognition through Wearable Sensor Data Augmentation

  • Hyungju Kim;Nammee Moon
    • Journal of Information Processing Systems
    • /
    • v.20 no.2
    • /
    • pp.159-172
    • /
    • 2024
  • The number of healthcare products available for pets has increased in recent times, which has prompted active research into wearable devices for pets. However, the data collected through such devices are limited by outliers and missing values owing to the anomalous and irregular characteristics of pets. Hence, we propose pet behavior recognition based on a hybrid one-dimensional convolutional neural network (CNN) and long short- term memory (LSTM) model using pet wearable devices. An Arduino-based pet wearable device was first fabricated to collect data for behavior recognition, where gyroscope and accelerometer values were collected using the device. Then, data augmentation was performed after replacing any missing values and outliers via preprocessing. At this time, the behaviors were classified into five types. To prevent bias from specific actions in the data augmentation, the number of datasets was compared and balanced, and CNN-LSTM-based deep learning was performed. The five subdivided behaviors and overall performance were then evaluated, and the overall accuracy of behavior recognition was found to be about 88.76%.

End to End Autonomous Driving System using Out-layer Removal (Out-layer를 제거한 End to End 자율주행 시스템)

  • Seung-Hyeok Jeong;Dong-Ho Yun;Sung-Hun Hong
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.1
    • /
    • pp.65-70
    • /
    • 2023
  • In this paper, we propose an autonomous driving system using an end-to-end model to improve lane departure and misrecognition of traffic lights in a vision sensor-based system. End-to-end learning can be extended to a variety of environmental conditions. Driving data is collected using a model car based on a vision sensor. Using the collected data, it is composed of existing data and data with outlayers removed. A class was formed with camera image data as input data and speed and steering data as output data, and data learning was performed using an end-to-end model. The reliability of the trained model was verified. Apply the learned end-to-end model to the model car to predict the steering angle with image data. As a result of the learning of the model car, it can be seen that the model with the outlayer removed is improved than the existing model.

Movie Popularity Classification Based on Support Vector Machine Combined with Social Network Analysis

  • Dorjmaa, Tserendulam;Shin, Taeksoo
    • Journal of Information Technology Services
    • /
    • v.16 no.3
    • /
    • pp.167-183
    • /
    • 2017
  • The rapid growth of information technology and mobile service platforms, i.e., internet, google, and facebook, etc. has led the abundance of data. Due to this environment, the world is now facing a revolution in the process that data is searched, collected, stored, and shared. Abundance of data gives us several opportunities to knowledge discovery and data mining techniques. In recent years, data mining methods as a solution to discovery and extraction of available knowledge in database has been more popular in e-commerce service fields such as, in particular, movie recommendation. However, most of the classification approaches for predicting the movie popularity have used only several types of information of the movie such as actor, director, rating score, language and countries etc. In this study, we propose a classification-based support vector machine (SVM) model for predicting the movie popularity based on movie's genre data and social network data. Social network analysis (SNA) is used for improving the classification accuracy. This study builds the movies' network (one mode network) based on initial data which is a two mode network as user-to-movie network. For the proposed method we computed degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality as centrality measures in movie's network. Those four centrality values and movies' genre data were used to classify the movie popularity in this study. The logistic regression, neural network, $na{\ddot{i}}ve$ Bayes classifier, and decision tree as benchmarking models for movie popularity classification were also used for comparison with the performance of our proposed model. To assess the classifier's performance accuracy this study used MovieLens data as an open database. Our empirical results indicate that our proposed model with movie's genre and centrality data has by approximately 0% higher accuracy than other classification models with only movie's genre data. The implications of our results show that our proposed model can be used for improving movie popularity classification accuracy.

A Deep Learning Application for Automated Feature Extraction in Transaction-based Machine Learning (트랜잭션 기반 머신러닝에서 특성 추출 자동화를 위한 딥러닝 응용)

  • Woo, Deock-Chae;Moon, Hyun Sil;Kwon, Suhnbeom;Cho, Yoonho
    • Journal of Information Technology Services
    • /
    • v.18 no.2
    • /
    • pp.143-159
    • /
    • 2019
  • Machine learning (ML) is a method of fitting given data to a mathematical model to derive insights or to predict. In the age of big data, where the amount of available data increases exponentially due to the development of information technology and smart devices, ML shows high prediction performance due to pattern detection without bias. The feature engineering that generates the features that can explain the problem to be solved in the ML process has a great influence on the performance and its importance is continuously emphasized. Despite this importance, however, it is still considered a difficult task as it requires a thorough understanding of the domain characteristics as well as an understanding of source data and the iterative procedure. Therefore, we propose methods to apply deep learning for solving the complexity and difficulty of feature extraction and improving the performance of ML model. Unlike other techniques, the most common reason for the superior performance of deep learning techniques in complex unstructured data processing is that it is possible to extract features from the source data itself. In order to apply these advantages to the business problems, we propose deep learning based methods that can automatically extract features from transaction data or directly predict and classify target variables. In particular, we applied techniques that show high performance in existing text processing based on the structural similarity between transaction data and text data. And we also verified the suitability of each method according to the characteristics of transaction data. Through our study, it is possible not only to search for the possibility of automated feature extraction but also to obtain a benchmark model that shows a certain level of performance before performing the feature extraction task by a human. In addition, it is expected that it will be able to provide guidelines for choosing a suitable deep learning model based on the business problem and the data characteristics.

Load Modeling based on System Identification with Kalman Filtering of Electrical Energy Consumption of Residential Air-Conditioning

  • Patcharaprakiti, Nopporn;Tripak, Kasem;Saelao, Jeerawan
    • International journal of advanced smart convergence
    • /
    • v.4 no.1
    • /
    • pp.45-53
    • /
    • 2015
  • This paper is proposed mathematical load modelling based on system identification approach of energy consumption of residential air conditioning. Due to air conditioning is one of the significant equipment which consumes high energy and cause the peak load of power system especially in the summer time. The demand response is one of the solutions to decrease the load consumption and cutting peak load to avoid the reservation of power supply from power plant. In order to operate this solution, mathematical modelling of air conditioning which explains the behaviour is essential tool. The four type of linear model is selected for explanation the behaviour of this system. In order to obtain model, the experimental setup are performed by collecting input and output data every minute of 9,385 BTU/h air-conditioning split type with $25^{\circ}C$ thermostat setting of one sample house. The input data are composed of solar radiation ($W/m^2$) and ambient temperature ($^{\circ}C$). The output data are power and energy consumption of air conditioning. Both data are divided into two groups follow as training data and validation data for getting the exact model. The model is also verified with the other similar type of air condition by feed solar radiation and ambient temperature input data and compare the output energy consumption data. The best model in term of accuracy and model order is output error model with 70.78% accuracy and $17^{th}$ order. The model order reduction technique is used to reduce order of model to seven order for less complexity, then Kalman filtering technique is applied for remove white Gaussian noise for improve accuracy of model to be 72.66%. The obtained model can be also used for electrical load forecasting and designs the optimal size of renewable energy such photovoltaic system for supply the air conditioning.

DG-based SPO tuple recognition using self-attention M-Bi-LSTM

  • Jung, Joon-young
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.438-449
    • /
    • 2022
  • This study proposes a dependency grammar-based self-attention multilayered bidirectional long short-term memory (DG-M-Bi-LSTM) model for subject-predicate-object (SPO) tuple recognition from natural language (NL) sentences. To add recent knowledge to the knowledge base autonomously, it is essential to extract knowledge from numerous NL data. Therefore, this study proposes a high-accuracy SPO tuple recognition model that requires a small amount of learning data to extract knowledge from NL sentences. The accuracy of SPO tuple recognition using DG-M-Bi-LSTM is compared with that using NL-based self-attention multilayered bidirectional LSTM, DG-based bidirectional encoder representations from transformers (BERT), and NL-based BERT to evaluate its effectiveness. The DG-M-Bi-LSTM model achieves the best results in terms of recognition accuracy for extracting SPO tuples from NL sentences even if it has fewer deep neural network (DNN) parameters than BERT. In particular, its accuracy is better than that of BERT when the learning data are limited. Additionally, its pretrained DNN parameters can be applied to other domains because it learns the structural relations in NL sentences.

XML-Based Network Services for Real-Time Process Data (실시간 공정 데이터를 위한 XML 기반 네트워크 서비스)

  • Choo, Young-Yeol;Song, Myoung-Gyu
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.14 no.2
    • /
    • pp.184-190
    • /
    • 2008
  • This paper describes a message model based on XML (eXtensible Markup Language) to present real-time data from sensors and instruments at manufacturing processes for web service. HTML (Hyper Text Markup Language) is inadequate for describing real-time data from process control plants while it is suitable for displaying non-real-time multimedia data on web. For XML-based web service of process data, XML format for the data presentation was proposed after investigating data of various instruments at steel-making plants. Considering transmission delay inevitably caused from increased message length and processing delay from transformation of raw data into defined format, which was critical for operation of a real-time system, its performance was evaluated by simulation. In the simulation, we assumed two implementation models for conducting the transformation function. In one model, transformation was done at an SCC (Supervisory Control Computer) after receiving real-time data from instruments. In the other model, transformation had been carried out at instruments before the data were transmitted to the SCC. Various tests had been conducted under different conditions of offered loads and data lengths and their results were described.

Design and Evaluation of a Quorum-Based Adaptive Dissemination Algorithm for Critical Data in IoTs (IoT에서 중요한 데이터를 위한 쿼럼 기반 적응적 전파 알고리즘의 설계 및 평가)

  • Bae, Ihn Han;Noh, Heung Tae
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.8
    • /
    • pp.913-922
    • /
    • 2019
  • The Internet of Things (IoT) envisions smart objects collecting and sharing data at a massive scale via the Internet. One challenging issue is how to disseminate data to relevant data consuming objects efficiently. In such a massive IoT network, Mission critical data dissemination imposes constraints on the message transfer delay between objects. Due to the low power and communication range of IoT objects, data is relayed over multi-hops before arriving at the destination. In this paper, we propose a quorum-based adaptive dissemination algorithm (QADA) for the critical data in the monitoring-based applications of massive IoTs. To design QADA, we first design a new stepped-triangular grid structures (sT-grid) that support data dissemination, then construct a triangular grid overlay in the fog layer on the lower IoT layer and propose the data dissemination algorithm of the publish/subscribe model that adaptively uses triangle grid (T-grid) and sT-grid quorums depending on the mission critical in the overlay constructed to disseminate the critical data, and evaluate its performance as an analytical model.