• Title/Summary/Keyword: Bigdata

Search Result 589, Processing Time 0.03 seconds

Artificial Intelligence Algorithms, Model-Based Social Data Collection and Content Exploration (소셜데이터 분석 및 인공지능 알고리즘 기반 범죄 수사 기법 연구)

  • An, Dong-Uk;Leem, Choon Seong
    • The Journal of Bigdata
    • /
    • v.4 no.2
    • /
    • pp.23-34
    • /
    • 2019
  • Recently, the crime that utilizes the digital platform is continuously increasing. About 140,000 cases occurred in 2015 and about 150,000 cases occurred in 2016. Therefore, it is considered that there is a limit handling those online crimes by old-fashioned investigation techniques. Investigators' manual online search and cognitive investigation methods those are broadly used today are not enough to proactively cope with rapid changing civil crimes. In addition, the characteristics of the content that is posted to unspecified users of social media makes investigations more difficult. This study suggests the site-based collection and the Open API among the content web collection methods considering the characteristics of the online media where the infringement crimes occur. Since illegal content is published and deleted quickly, and new words and alterations are generated quickly and variously, it is difficult to recognize them quickly by dictionary-based morphological analysis registered manually. In order to solve this problem, we propose a tokenizing method in the existing dictionary-based morphological analysis through WPM (Word Piece Model), which is a data preprocessing method for quick recognizing and responding to illegal contents posting online infringement crimes. In the analysis of data, the optimal precision is verified through the Vote-based ensemble method by utilizing a classification learning model based on supervised learning for the investigation of illegal contents. This study utilizes a sorting algorithm model centering on illegal multilevel business cases to proactively recognize crimes invading the public economy, and presents an empirical study to effectively deal with social data collection and content investigation.

  • PDF

Development of Long-Term Electricity Demand Forecasting Model using Sliding Period Learning and Characteristics of Major Districts (주요 지역별 특성과 이동 기간 학습 기법을 활용한 장기 전력수요 예측 모형 개발)

  • Gong, InTaek;Jeong, Dabeen;Bak, Sang-A;Song, Sanghwa;Shin, KwangSup
    • The Journal of Bigdata
    • /
    • v.4 no.1
    • /
    • pp.63-72
    • /
    • 2019
  • For power energy, optimal generation and distribution plans based on accurate demand forecasts are necessary because it is not recoverable after they have been delivered to users through power generation and transmission processes. Failure to predict power demand can cause various social and economic problems, such as a massive power outage in September 2011. In previous studies on forecasting power demand, ARIMA, neural network models, and other methods were developed. However, limitations such as the use of the national average ambient air temperature and the application of uniform criteria to distinguish seasonality are causing distortion of data or performance degradation of the predictive model. In order to improve the performance of the power demand prediction model, we divided Korea into five major regions, and the power demand prediction model of the linear regression model and the neural network model were developed, reflecting seasonal characteristics through regional characteristics and migration period learning techniques. With the proposed approach, it seems possible to forecast the future demand in short term as well as in long term. Also, it is possible to consider various events and exceptional cases during a certain period.

  • PDF

An elastic distributed parallel Hadoop system for bigdata platform and distributed inference engines (동적 분산병렬 하둡시스템 및 분산추론기에 응용한 서버가상화 빅데이터 플랫폼)

  • Song, Dong Ho;Shin, Ji Ae;In, Yean Jin;Lee, Wan Gon;Lee, Kang Se
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1129-1139
    • /
    • 2015
  • Inference process generates additional triples from knowledge represented in RDF triples of semantic web technology. Tens of million of triples as an initial big data and the additionally inferred triples become a knowledge base for applications such as QA(question&answer) system. The inference engine requires more computing resources to process the triples generated while inferencing. The additional computing resources supplied by underlying resource pool in cloud computing can shorten the execution time. This paper addresses an algorithm to allocate the number of computing nodes "elastically" at runtime on Hadoop, depending on the size of knowledge data fed. The model proposed in this paper is composed of the layered architecture: the top layer for applications, the middle layer for distributed parallel inference engine to process the triples, and lower layer for elastic Hadoop and server visualization. System algorithms and test data are analyzed and discussed in this paper. The model hast the benefit that rich legacy Hadoop applications can be run faster on this system without any modification.

Factors influencing metabolic syndrome perception and exercising behaviors in Korean adults: Data mining approach (대사증후군의 인지와 신체활동 실천에 영향을 미치는 요인: 데이터 마이닝 접근)

  • Lee, Soo-Kyoung;Moon, Mikyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.12
    • /
    • pp.581-588
    • /
    • 2017
  • This study was conducted to determine which factors would predict metabolic syndrome (MetS) perception and exercise by applying a machine learning classifier, or Extreme Gradient Boosting algorithm (XGBoost) from July 2014 to December 2015. Data were obtained from the Korean Community Health Survey (KCHS), representing different community-dwelling Korean adults 19 years and older, from 2009 to 2013. The dataset includes 370,430 adults. Outcomes were categorized as follows based on the perception of MetS and physical activity (PA): Stage 1 (no perception, no PA), Stage 2 (perception, no PA), and Stage 3 (perception, PA). Features common to all questionnaires for the last 5 years were selected for modeling. Overall, there were 161 features, categorical except for age and the visual analogue scale (EQ-VAS). We used the Extreme Boosting algorithm in R programming for a model to predict factors and achieved prediction accuracy in 0.735 submissions. The top 10 predictive factors in Stage 3 were: age, education level, attempt to control weight, EQ mobility, nutrition label checks, private health insurance, EQ-5D usual activities, anti-smoking advertising, EQ-VAS, education in health centers for diabetes, and dental care. In conclusion, the results showed that XGBoost can be used to identify factors influencing disease prevention and management using healthcare bigdata.

Bigdata Analysis of Fine Dust Theme Stock Price Volatility According to PM10 Concentration Change (PM10 농도변화에 따른 미세먼지 테마주 주가변동 빅데이터 분석)

  • Kim, Mu Jeong;Lim, Gyoo Gun
    • Journal of Service Research and Studies
    • /
    • v.10 no.1
    • /
    • pp.55-67
    • /
    • 2020
  • Fine dust has recently become one of the greatest concerns of Korean people and has been a target of considerable efforts by governments and local governments. In the academic world, many researches have been carried out in relation to fine dust, but the research on the economic field has been relatively few. So we wanted to know how fine dust affects the economy. Big data of PM10 concentration for fine dust and fine dust theme stock price were collected for five years from 2013 to 2017. Regression analysis was performed using the linear regression model, the generalized least squares method. As a result, the change in the fine dust concentration was found to have a effect on the related theme stocks' price. When the fine dust concentration increased compared to the previous day, the fine dust theme stocks' price also showed a tendency to increase. Also, according to the analysis of stock price change from 2013 to 2017 based on fine dust theme stocks, companies with large regression coefficients were changed every year. Among them, the regression coefficients of Monalisa were repeatedly high in 2014, 2015, 2017, Samil Pharmaceutical in 2015, 2016 and 2017, and Welcron in 2016 and 2017, and the companies were judged to be sensitive to the concentration of fine dust. The companies that responded the most in the past 5 years were Wokong, Welcron, Dongsung Pharmaceutical, Samil Pharmaceutical, and Monalisa. If PM2.5 measurement data are accumulated enough, it would be meaningful to compare and analyze PM2.5 concentration with independent variables. In this study, only the fine dust concentration is used as an independent variable. However, it is expected that a more clear and well-explained result can be found by adding appropriate additional variables to increase the explanatory power.

Necessity of the Physical Distribution Cooperation to Enhance Competitive Capabilities of Healthcare SCM -Bigdata Business Model's Viewpoint- (의료 SCM 경쟁역량 강화를 위한 물류공동화 도입 필요성 -빅데이터 비즈니스 모델 관점-)

  • Park, Kwang-O;Jung, Dae-Hyun;Kwon, Sang-Min
    • Management & Information Systems Review
    • /
    • v.39 no.3
    • /
    • pp.17-35
    • /
    • 2020
  • The purpose of this study is to develop business models for current situational scenarios reflecting customer needs emphasize the need for implementing a logistics cooperation system by analyzing big data to strengthen SCM competitiveness capacities. For healthcare SCM competitiveness needed for the logistics cooperation usage intent, they were divided into product quality, price leadership, hand-over speed, and process flexibility for examination. The wordcloud results that analyzed major considerations to realize work efficiency between medical institutes, words like unexpected situations, information sharing, delivery, real-time, delivery, convenience, etc. were mentioned frequently. It can be analyzed as expressing the need to construct a system that can immediately respond to emergency situations on the weekends. Furthermore, in addition to pursuing communication and convenience, the importance of real-time information sharing that can share to the efficiency of inventory management were evident. Accordingly, it is judged that it is necessary to aim for a business model that can enhance visibility of the logistics pipeline in real-time using big data analysis on site. By analyzing the effects of the adaptability of a supply chain network for healthcare SCM competitiveness, it was revealed that obtaining competitive capacities is possible through the implementation of logistics cooperation. Stronger partnerships such as logistics cooperation will lead to SCM competitive capacities. It will be necessary to strengthen SCM competitiveness by searching for a strategic approach among companies in a direction that can promote mutual partnerships among companies using the joint logistics system of medical institutes. In particular, it will be necessary to search for ways to utilize HCSM through big data analysis according to the construction of a logistics cooperation system.

Social Factors Affecting Internet Searches on Cyber Bullying in Korea and America Using Social Big Data and Google Search Trends (소셜 빅데이터와 Google 검색트렌드를 활용한 한국과 미국의 사이버불링 검색에 영향을 미치는 요인 분석)

  • Song, Tae-Min;Song, Juyoung;Cheon, Mi-Kyung
    • The Journal of Bigdata
    • /
    • v.1 no.1
    • /
    • pp.67-75
    • /
    • 2016
  • The study analyzed big data extracted from Google and social media to identify factors related to searches on cyber bullying in Korea and America. Korea's cyber bullying analysis was conducted social big data collected from online news sites, blogs, $caf{\acute{e}}s$, social network services and message for between January 1, 2011 and March 31, 2013. Google search trends for the search words of stress, exercise, drinking, and cyber bullying were obtained for January 1, 2004 and December 22, 2013. The main results of this study were as follows: first, the significant factors stress were cyber bullying that Korea more than America. Secondly, a positive relationship was found between stress and drinking, exercise and cyber bullying both Korea and America. Thirdly, significant differences were found all path both Korea and America. The study shows that both adults and teenagers are influenced in Korea. We need to develop online application that if cyber bullying behavior was predicted can intervene in real time because these actual cyber bullying-related exposure to psychological and behavioral characteristic.

  • PDF

A Study on the Prediction for Apartment Sales Price: Focusing on the Basic Property, Economy, Education, Culture and Transportation Properties in S city, Gyeonggi-do (아파트 매매가격 예측에 관한 연구: 경기도 S시 아파트 기본속성과 경제·교육·문화·교통 속성을 중심으로)

  • Kim, Seonghun;Lee, Jung-Mok;Lee, Hyang-Seob;Yu, Su-Han;Shin, WooJin;Yu, Jong-Pil
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.109-124
    • /
    • 2020
  • In Korea, despite much interest in real estate, it is not easy to predict prices. Because apartments are both residential spaces and investment materials. Key figures affecting the price of apartments vary widely, and there are also regional characteristics. This study was conducted to derive the factors and characteristics that affect the sale price of apartments in S City, Gyeonggi-do. In general, people diagnose that better subway accessibility leads to higher apartment sales price. Nevertheless, in the case of S City, the price was slightly lower as it was closer to Line 1, but the higher the subway accessibility at Shinbundang Line, the higher the price. The five-year average of government bonds and the price were inversely related, and it was found to be proportional to the M2 balance and the price. The floor area ratio and the total number of parking lots had a great influence on the price, and the presence of department stores and discount marts within 1.5 km were the most important factors in the area of cultural aspect.

Study on Extracting Filming Location Information in Movies Using OCR for Developing Customized Travel Content (맞춤형 여행 콘텐츠 개발을 위한 OCR 기법을 활용한 영화 속 촬영지 정보 추출 방안 제시)

  • Park, Eunbi;Shin, Yubin;Kang, Juyoung
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.29-39
    • /
    • 2020
  • Purpose The atmosphere of respect for individual tastes that have spread throughout society has changed the consumption trend. As a result, the travel industry is also seeing customized travel as a new trend that reflects consumers' personal tastes. In particular, there is a growing interest in 'film-induced tourism', one of the areas of travel industry. We hope to satisfy the individual's motivation for traveling while watching movies with customized travel proposals, which we expect to be a catalyst for the continued development of the 'film-induced tourism industry'. Design/methodology/approach In this study, we implemented a methodology through 'OCR' of extracting and suggesting film location information that viewers want to visit. First, we extract a scene from a movie selected by a user by using 'OpenCV', a real-time image processing library. In addition, we detected the location of characters in the scene image by using 'EAST model', a deep learning-based text area detection model. The detected images are preprocessed by using 'OpenCV built-in function' to increase recognition accuracy. Finally, after converting characters in images into recognizable text using 'Tesseract', an optical character recognition engine, the 'Google Map API' returns actual location information. Significance This research is significant in that it provides personalized tourism content using fourth industrial technology, in addition to existing film tourism. This could be used in the development of film-induced tourism packages with travel agencies in the future. It also implies the possibility of being used for inflow from abroad as well as to abroad.

Developing Graphic Interface for Efficient Online Searching and Analysis of Graph-Structured Bibliographic Big Data (그래프 구조를 갖는 서지 빅데이터의 효율적인 온라인 탐색 및 분석을 지원하는 그래픽 인터페이스 개발)

  • You, Youngseok;Park, Beomjun;Jo, Sunhwa;Lee, Suan;Kim, Jinho
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.77-88
    • /
    • 2020
  • Recently, many researches habe been done to organize and analyze various complex relationships in real world, represented in the form of graphs. In particular, the computer field literature data system, such as DBLP, is a representative graph data in which can be composed of papers, their authors, and citation among papers. Becasue graph data is very complex in storage structure and expression, it is very difficult task to search, analysis, and visualize a large size of bibliographic big data. In this paper, we develop a graphic user interface tool, called EEUM, which visualizes bibliographic big data in the form of graphs. EEUM provides the features to browse bibliographic big data according to the connected graph structure by visually displaying graph data, and implements search, management and analysis of the bibliographc big data. It also shows that EEUM can be conveniently used to search, explore, and analyze by applying EEUM to the bibliographic graph big data provided by DBLP. Through EEUM, you can easily find influential authors or papers in every research fields, and conveniently use it as a search and analysis tool for complex bibliographc big data, such as giving you a glimpse of all the relationships between several authors and papers.