• Title/Summary/Keyword: 빅데이터 분석 기법

Search Result 596, Processing Time 0.027 seconds

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

Research on Location Selection Method Development for Storing Service Parts using Data Analytics (데이터 분석 기법을 활용한 서비스 부품의 저장 위치 선정 방안 수립 연구)

  • Son, Jin-Ho;Shin, KwangSup
    • The Journal of Bigdata
    • /
    • v.2 no.2
    • /
    • pp.33-46
    • /
    • 2017
  • Service part has the attribute causing a difficulty of the systematic management like a kind of diversity, uncertainty of demand, high request for quick response against general complete product. Especially, order picking is recognized as the most important work in the warehouse of the parts since inbound cycle of the service part long but outbound cycle is relatively short. But, increasing work efficiency in the warehouse has a limitation that cycle, frequency and quantity for the outbound request depend on the inherent features of the part. Through this research, not only are the types of the parts classified with the various and specified data but also the method is presented that it minimizes (that) the whole distances of the order picking and store location about both inbound and outbound by developing the model of the demand prediction. Based on this study, I expect that all of the work efficiency and the space utilization will be improved without a change of the inbound and outbound quantity in the warehouse.

  • PDF

Classification of Seoul Metro Stations Based on Boarding/ Alighting Patterns Using Machine Learning Clustering (기계학습 클러스터링을 이용한 승하차 패턴에 따른 서울시 지하철역 분류)

  • Min, Meekyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.4
    • /
    • pp.13-18
    • /
    • 2018
  • In this study, we classify Seoul metro stations according to boarding and alighting patterns using machine earning technique. The target data is the number of boarding and alighting passengers per hour every day at 233 subway stations from 2008 to 2017 provided by the public data portal. Gaussian mixture model (GMM) and K-means clustering are used as machine learning techniques in order to classify subway stations. The distribution of the boarding time and the alighting time of the passengers can be modeled by the Gaussian mixture model. K-means clustering algorithm is used for unsupervised learning based on the data obtained by GMM modeling. As a result of the research, Seoul metro stations are classified into four groups according to boarding and alighting patterns. The results of this study can be utilized as a basic knowledge for analyzing the characteristics of Seoul subway stations and analyzing it economically, socially and culturally. The method of this research can be applied to public data and big data in areas requiring clustering.

Exploring the Factors Influencing Students' Career Maturity in Seoul City Middle School: A Machine Learning (머신러닝을 활용한 서울시 중학생 진로성숙도 예측 요인 탐색)

  • Park, Jung
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.155-170
    • /
    • 2020
  • The purpose of this study was to apply machine learning techniques (Decision Tree, Random Forest, XGBoost) to data from the 4th~6th year of the Seoul Education Longitudinal Study to find the factors predicting the career maturity of middle school students in Seoul city. In order to evaluate the machine learning application result, the performance of the model according to the indicators was checked. In addition, the model was analyzed using the XGBoostExplainer package, and R and R Studio tools were used for this study. As a result, there was a slight difference in the ranking of variable importance by each model, but the rankings were high in 'Achievement goal awareness', 'Creativity', 'Self-concept', 'Relationship with parents and children', and 'Resilience'. In addition, using the XGBoostExplainer package, it was found that the factors that protect and deteriorate career maturity by panel and 'Achievement goal awareness' is the top priority factor for predicting career maturity. Based on the results of this study, it was suggested that a comparative study of machine learning and variable selection methods and a comparative study of each cohort of the Seoul Education Termination Study should be conducted.

Optimization with Genetic Algorithms for Food Delivery Dispatch (유전자 알고리즘을 이용한 음식 배달 최적화 기법)

  • Yang, Soyeon;Lim, Yujin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.347-349
    • /
    • 2022
  • 과열되는 배달 시장의 경쟁 속에서 수많은 배달원이 속도전에 내몰리고 있다. 배달 앱 시장에서는 속도전의 승리를 위해 단건 배달 서비스를 내놓았지만 이러한 배달 경쟁은 배달비 인상으로 이어져 소비자의 부담으로 돌아왔다. 본 논문에서는 배달 업무의 고른 분배를 통해 배달원들의 경쟁을 완화하고 전체 배달 시스템의 처리량과 신뢰도를 향상하고자 하였다. 따라서 목적함수의 최적화와 무작위성이라는 특징을 가진 유전 알고리즘을 활용하여 배달원들 간 배달 업무의 고른 분배를 통해 시스템 성능을 향상시켰다. 실험을 통해 기존의 배치 기법에 비하여 제안 알고리즘에서의 성능이 향상되었음을 확인하였다.

Using the SIEM Software vulnerability detection model proposed (SIEM을 이용한 소프트웨어 취약점 탐지 모델 제안)

  • Jeon, In-seok;Han, Keun-hee;Kim, Dong-won;Choi, Jin-yung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.25 no.4
    • /
    • pp.961-974
    • /
    • 2015
  • With the advancement of SIEM from ESM, it allows deep correlated analysis using huge amount of data. By collecting software's vulnerabilities from assessment with certain classification measures (e.g., CWE), it can improve detection rate effectively, and respond to software's vulnerabilities by analyzing big data. In the phase of monitoring and vulnerability diagnosis Process, it not only detects predefined threats, but also vulnerabilities of software in each resources could promptly be applied by sharing CCE, CPE, CVE and CVSS information. This abstract proposes a model for effective detection and response of software vulnerabilities and describes effective outcomes of the model application.

Self-Disclosure and Boundary Impermeability among Languages of Twitter Users (트위터 이용자의 언어권별 자기노출 및 경계 불투과성)

  • Jang, Phil-Sik
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.4
    • /
    • pp.434-441
    • /
    • 2016
  • Using bigdata analysis procedures, the present study sought to review and explore the various aspects of self-disclosure and boundary impermeability of worldwide twitter users. A total of 415 million tweets issued by 54 million users were collected during 6 months and the users of top 10 languages were investigated. And the effect of languages of twitter users on the boundary impermeability, disclosure rate of user profile, profile image, geographical information, URL in profile and user description were analyzed in this study. The results showed that the boundary impermeability and all the self-disclosure rates of twitter users (profile, profile image, geographical information, URL in profile, user description) were significantly (p<0.001) different among language groups of users. The self-disclosure rates and the average points of Portuguese, Indonesian and Spanish users were higher than those of Arabic, Japanese, Turkish and Korean users. The results also showed a positive relationship between boundary impermeability and the number of tweets (including retweets) issued by each users.

A Design of Satisfaction Analysis System For Content Using Opinion Mining of Online Review Data (온라인 리뷰 데이터의 오피니언마이닝을 통한 콘텐츠 만족도 분석 시스템 설계)

  • Kim, MoonJi;Song, EunJeong;Kim, YoonHee
    • Journal of Internet Computing and Services
    • /
    • v.17 no.3
    • /
    • pp.107-113
    • /
    • 2016
  • Following the recent advancement in the use of social networks, a vast amount of different online reviews is created. These variable online reviews which provide feedback data of contents' are being used as sources of valuable information to both contents' users and providers. With the increasing importance of online reviews, studies on opinion mining which analyzes online reviews to extract opinions or evaluations, attitudes and emotions of the writer have been on the increase. However, previous sentiment analysis techniques of opinion-mining focus only on the classification of reviews into positive or negative classes but does not include detailed information analysis of the user's satisfaction or sentiment grounds. Also, previous designs of the sentiment analysis technique only applied to one content domain that is, either product or movie, and could not be applied to other contents from a different domain. This paper suggests a sentiment analysis technique that can analyze detailed satisfaction of online reviews and extract detailed information of the satisfaction level. The proposed technique can analyze not only one domain of contents but also a variety of contents that are not from the same domain. In addition, we design a system based on Hadoop to process vast amounts of data quickly and efficiently. Through our proposed system, both users and contents' providers will be able to receive feedback information more clearly and in detail. Consequently, potential users who will use the content can make effective decisions and contents' providers can quickly apply the users' responses when developing marketing strategy as opposed to the old methods of using surveys. Moreover, the system is expected to be used practically in various fields that require user comments.

A Study for Development of Expressway Traffic Accident Prediction Model Using Deep Learning (딥 러닝을 이용한 고속도로 교통사고 건수 예측모형 개발에 관한 연구)

  • Rye, Jong-Deug;Park, Sangmin;Park, Sungho;Kwon, Cheolwoo;Yun, Ilsoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.4
    • /
    • pp.14-25
    • /
    • 2018
  • In recent years, it has become technically easier to explain factors related with traffic accidents in the Big Data era. Therefore, it is necessary to apply the latest analysis techniques to analyze the traffic accident data and to seek for new findings. The purpose of this study is to compare the predictive performance of the negative binomial regression model and the deep learning method developed in this study to predict the frequency of traffic accidents in expressways. As a result, the MOEs of the deep learning model are somewhat superior to those of the negative binomial regression model in terms of prediction performance. However, using a deep learning model could increase the predictive reliability. However, it is easy to add other independent variables when using deep learning, and it can be expected to increase the predictive reliability even if the model structure is changed.

Military Security Policy Research Using Big Data and Text Mining (빅데이터와 텍스트마이닝 기법을 활용한 군사보안정책 탐구)

  • Kim, Doo Hwan;Park, Ho Jeong
    • Convergence Security Journal
    • /
    • v.19 no.4
    • /
    • pp.23-34
    • /
    • 2019
  • This study utilized big data, one of the new technologies of the Fourth Industrial Revolution as a policy direction study related to the military security of the Army. By utilizing Text mining and analyzing military security trends in domestic and foreign papers, it will be able to set policy directions and reduce trial and error. In this study, we found differences in domestic and international studies on military sucurity. At first, Domestic research has shown that in the course of the fourth industrial revolution, there is a strong interest in technological security, such as IT technology in security and cyber security in North Korea. On the other hand, Foreign research confirmed that policies are being studied in such a way that military sucurity is needed at the level of cooperation between countries and that it can contribute to world peace. Various academic policy studies have been underway in terms of determining world peace and security levels, not just security levels. It contrasted in our immediate confrontation with North Korea for decades but suggest complementary measures that cannot be overlooked from a grand perspective. Conclusionally, the direction of academic research in domestic and foreign should be done in macro perspective under national network cooperation, not just technology sucurity research, recognizing that military security is a policy product that should be studied in a security system between countries.