• Title/Summary/Keyword: Intelligent Data Analysis

Search Result 1,456, Processing Time 0.034 seconds

Improvement of generalization of linear model through data augmentation based on Central Limit Theorem (데이터 증가를 통한 선형 모델의 일반화 성능 개량 (중심극한정리를 기반으로))

  • Hwang, Doohwan
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.19-31
    • /
    • 2022
  • In Machine learning, we usually divide the entire data into training data and test data, train the model using training data, and use test data to determine the accuracy and generalization performance of the model. In the case of models with low generalization performance, the prediction accuracy of newly data is significantly reduced, and the model is said to be overfit. This study is about a method of generating training data based on central limit theorem and combining it with existed training data to increase normality and using this data to train models and increase generalization performance. To this, data were generated using sample mean and standard deviation for each feature of the data by utilizing the characteristic of central limit theorem, and new training data was constructed by combining them with existed training data. To determine the degree of increase in normality, the Kolmogorov-Smirnov normality test was conducted, and it was confirmed that the new training data showed increased normality compared to the existed data. Generalization performance was measured through differences in prediction accuracy for training data and test data. As a result of measuring the degree of increase in generalization performance by applying this to K-Nearest Neighbors (KNN), Logistic Regression, and Linear Discriminant Analysis (LDA), it was confirmed that generalization performance was improved for KNN, a non-parametric technique, and LDA, which assumes normality between model building.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Design of the student Career prediction program using the decision tree algorithm (의사결정트리 알고리즘을 이용한 학생진로 예측 프로그램의 설계)

  • Kim, Geun-Ho;Jeong, Chong-In;Kim, Chang-Seok;Kang, Shin-Chun;Kim, Eui-Jeong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.332-335
    • /
    • 2018
  • In recent years, artificial intelligence using big data has become a big issue in IT. Various studies are being conducted on services or technologies to effectively handle big data. The educational field, there is big data about students, but it is only a simple process to collect, lookup and store such data. In the future, it makes extensive use of artificial intelligence, machine learning, and statistical analysis to find meaningful rules, patterns, and relationships in the big data of the educational field, and to produce intelligent and useful data for the actual students. Accordingly, this study aims to design a program to predict the career of students using a decision tree algorithm based on the data from the student's classroom observations. Through a career prediction program, it is believed to be helpful to present application paths to students ' counseling and to also provide classroom behavior and direction based on the desired courses.

  • PDF

A Study on the Fault Data Transmission through the Web using the XML Web Service and the Fault Type Determination of the Fault Data Received from the Web (XML Web Service를 이용한 고장 데이터의 웹 전송과 웹으로 수신된 고장 데이터의 고장 유형 판별에 관한 연구)

  • Kim In-Su;Hong Jung-Gi;Lee Hahk-Sung
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.55 no.1
    • /
    • pp.18-23
    • /
    • 2006
  • Recently as the power system has been becoming massive and complicated, most of the faults bring on severe proliferation effects. Because of the complexity of the power system it is not easy to analyze faults-the calculation of current flows under fault conditions. Therefore many researches have been performed in this area. As a result of those efforts, the protective equipments for a power system have been designed to operate properly and without damage when the highest possible fault current is flowing in the power system. Most of the fault data can be also acquired from intelligent protection equipments. The fault data saved in them don't always include the fault type information. n you don't have knowledge about the fault analysis, it becomes useless. So this paper presents 3 topics to increase a reusability of them as followings. First, describes a fault data using the XML(extensible Markup Language). It would be a well-formed and valid document complied with suggested XML DTD(Document Type Definition). In this paper I suggest a standard DTD to describe the power system fault. If the XML document describes any power system faults is validated against suggested DTD, it is possible to be used in any applications. Second, sends them through the web using the XML web service. Last, presents the rapid and accurate algorithm for a fault type determination of the fault data received from the web. In the ultimate the client to request the server to analyze a fault data is provided the correct information what kind of fault is occurred.

IP-Based Heterogeneous Network Interface Gateway for IoT Big Data Collection (IoT 빅데이터 수집을 위한 IP기반 이기종 네트워크 인터페이스 연동 게이트웨이)

  • Kang, Jiheon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.2
    • /
    • pp.173-178
    • /
    • 2019
  • Recently, the types and amount of data generated, collected, and measured in IoT such as smart home, security, and factory are increasing. The technologies for IoT service include sensor devices to measure desired data, embedded software to control the devices such as signal processing, wireless network protocol to transmit and receive the measured data, and big data and AI-based analysis. In this paper, we focused on developing a gateway for interfacing heterogeneous sensor network protocols that are used in various IoT devices and propose a heterogeneous network interface IoT gateway. We utilized a OpenWrt-based wireless routers and used 6LoWAN stack for IP-based communication via BLE and IEEE 802.15.4 adapters. We developed a software to convert Z-Wave and LoRa packets into IP packet using our Python-based middleware. We expect the IoT gateway to be used as an effective device for collecting IoT big data.

Development of the Regulatory Impact Analysis Framework for the Convergence Industry: Case Study on Regulatory Issues by Emerging Industry (융합산업 규제영향분석 프레임워크 개발: 신산업 분야별 규제이슈 사례 연구)

  • Song, Hye-Lim;Seo, Bong-Goon;Cho, Sung-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.199-230
    • /
    • 2021
  • Innovative new products and services are being launched through the convergence between heterogeneous industries, and social interest and investment in convergence industries such as AI, big data-based future cars, and robots are continuously increasing. However, in the process of commercialization of convergence new products and services, there are many cases where they do not conform to the existing regulatory and legal system, which causes many difficulties in companies launching their products and services into the market. In response to these industrial changes, the current government is promoting the improvement of existing regulatory mechanisms applied to the relevant industry along with the expansion of investment in new industries. This study, in these convergence industry trends, aimed to analysis the existing regulatory system that is an obstacle to market entry of innovative new products and services in order to preemptively predict regulatory issues that will arise in emerging industries. In addition, it was intended to establish a regulatory impact analysis system to evaluate adequacy and prepare improvement measures. The flow of this study is divided into three parts. In the first part, previous studies on regulatory impact analysis and evaluation systems are investigated. This was used as basic data for the development direction of the regulatory impact framework, indicators and items. In the second regulatory impact analysis framework development part, indicators and items are developed based on the previously investigated data, and these are applied to each stage of the framework. In the last part, a case study was presented to solve the regulatory issues faced by actual companies by applying the developed regulatory impact analysis framework. The case study included the autonomous/electric vehicle industry and the Internet of Things (IoT) industry, because it is one of the emerging industries that the Korean government is most interested in recently, and is judged to be most relevant to the realization of an intelligent information society. Specifically, the regulatory impact analysis framework proposed in this study consists of a total of five steps. The first step is to identify the industrial size of the target products and services, related policies, and regulatory issues. In the second stage, regulatory issues are discovered through review of regulatory improvement items for each stage of commercialization (planning, production, commercialization). In the next step, factors related to regulatory compliance costs are derived and costs incurred for existing regulatory compliance are calculated. In the fourth stage, an alternative is prepared by gathering opinions of the relevant industry and experts in the field, and the necessity, validity, and adequacy of the alternative are reviewed. Finally, in the final stage, the adopted alternatives are formulated so that they can be applied to the legislation, and the alternatives are reviewed by legal experts. The implications of this study are summarized as follows. From a theoretical point of view, it is meaningful in that it clearly presents a series of procedures for regulatory impact analysis as a framework. Although previous studies mainly discussed the importance and necessity of regulatory impact analysis, this study presented a systematic framework in consideration of the various factors required for regulatory impact analysis suggested by prior studies. From a practical point of view, this study has significance in that it was applied to actual regulatory issues based on the regulatory impact analysis framework proposed above. The results of this study show that proposals related to regulatory issues were submitted to government departments and finally the current law was revised, suggesting that the framework proposed in this study can be an effective way to resolve regulatory issues. It is expected that the regulatory impact analysis framework proposed in this study will be a meaningful guideline for technology policy researchers and policy makers in the future.

A Comparative Analysis of the Changes in Perception of the Fourth Industrial Revolution: Focusing on Analyzing Social Media Data (4차 산업혁명에 대한 인식 변화 비교 분석: 소셜 미디어 데이터 분석을 중심으로)

  • You, Jae Eun;Choi, Jong Woo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.11
    • /
    • pp.367-376
    • /
    • 2020
  • The fourth industrial revolution will greatly contribute to the entry of objects into an intelligent society through technologies such as big data and an artificial intelligence. Through the revolution, we were able to understand human behavior and awareness, and through the use of an artificial intelligence, we established ourselves as a key tool in various fields such as medicine and science. However, the fourth industrial revolution has a negative side with a positive future. In this study, an analysis was conducted using text mining techniques based on unstructured big data collected through social media. We wanted to look at keywords related to the fourth industrial revolution by year (2016, 2017 and 2018) and understand the meaning of each keyword. In addition, we understood how the keywords related to the Fourth Industrial Revolution changed with the change of the year and wanted to use R to conduct a Keyword Analysis to identify the recognition flow closely related to the Fourth Industrial Revolution through the keyword flow associated with the Fourth Industrial Revolution. Finally, people's perceptions of the fourth industrial revolution were identified by looking at the positive and negative feelings related to the fourth industrial revolution by year. The analysis showed that negative opinions were declining year after year, with more positive outlook and future.

Study on Queue Length Estimation using GPS Trajectory Data (GPS 데이터를 이용한 대기행렬길이 산출에 관한 연구)

  • Lee, Yong-Ju;Hwang, Jae-Seong;Lee, Choul-Ki
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.3
    • /
    • pp.45-51
    • /
    • 2016
  • Existing real-time signal control system was brought up typical problems which are supersaturated condition, point detection system and loop detection system. For that reason, the next generation signal control system of advanced form is required. Following thesis aimed at calculating queue length for the next generation signal control system to utilize basic parameter of signal control in crossing queue instead of the volume of real-time through traffic. Overflow saturated condition which was appeared as limit of existing system was focused to set-up range. Real-time location information of individual vehicle which is collected by GPS data. It converted into the coordinate to apply shock wave model with an linear equation that is extracted by regression model applied by a least square. Through the calculated queue length and link length by contrast, If queue length exceed the link, queue of downstream intersection is included as queue length that upstream queue vehicle is judeged as affecting downstream intersection. In result of operating correlation analysis among link travel time to judge confidence of extracted queue length, Both of links were shown over 0.9 values. It is appeared that both of links are highly correlated. Following research is significant using real-time data to calculate queue length and contributing to signal control system.

Spatiotemporal Traffic Density Estimation Based on Low Frequency ADAS Probe Data on Freeway (표본 ADAS 차두거리 기반 연속류 시공간적 교통밀도 추정)

  • Lim, Donghyun;Ko, Eunjeong;Seo, Younghoon;Kim, Hyungjoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.6
    • /
    • pp.208-221
    • /
    • 2020
  • The objective of this study is to estimate and analyze the traffic density of continuous flow using the trajectory of individual vehicles and the headway of sample probe vehicles-front vehicles obtained from ADAS (Advanced Driver Assitance System) installed in sample probe vehicles. In the past, traffic density of continuous traffic flow was mainly estimated by processing data such as traffic volume, speed, and share collected from Vehicle Detection System, or by counting the number of vehicles directly using video information such as CCTV. This method showed the limitation of spatial limitations in estimating traffic density, and low reliability of estimation in the event of traffic congestion. To overcome the limitations of prior research, In this study, individual vehicle trajectory data and vehicle headway information collected from ADAS are used to detect the space on the road and to estimate the spatiotemporal traffic density using the Generalized Density formula. As a result, an analysis of the accuracy of the traffic density estimates according to the sampling rate of ADAS vehicles showed that the expected sampling rate of 30% was approximately 90% consistent with the actual traffic density. This study contribute to efficient traffic operation management by estimating reliable traffic density in road situations where ADAS and autonomous vehicles are mixed.

A Study on the Analysis of Dangerous Driving Behavior and Traffic Accident Risk according to the Operation Characteristics of Commercial Freight Vehicles (사업용 화물자동차 운행특성에 따른 위험운전행동 및 교통사고 위험도 분석 연구)

  • Park, Jin soo;Lee, Soo beom;Park, Jun tae
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.21 no.2
    • /
    • pp.152-166
    • /
    • 2022
  • This study analyzed the causal relationship among operating characteristics of commercial freight vehicles, dangerous driving behaviors, and traffic accident risk. The study applied the existing accident cause and prevention theory to arrive at this relationship. Data related to working characteristics of driver, driving experience, driving ability, driving psychology, vehicle characteristics (size), dangerous driving behavior, and traffic accidents were collected from 303 commercial freight vehicle drivers. Working characteristics and dangerous driving behavior data are based on the driver's digital driving record. The traffic accident data is based on the insurance accident data reflecting actual traffic accidents. First, a structural equation model was built and verified using the model fitness index. Then, the developed model was used to analyze the causal relationship between multiple independent and dependent variables simultaneously. Four dangerous driving behaviors (sudden deceleration, sudden acceleration, sudden passing, and sudden stop) were found to be highly related to traffic accidents. The results further indicate that it is necessary to establish a safety management policy and intensive management for small-sized freight vehicles, drivers with insufficient driving ability, and drivers with dangerous driving behaviors. Such policy and management are expected to reduce traffic accidents effectively.