• 제목/요약/키워드: Distributed Data Mining

검색결과 111건 처리시간 0.026초

맵리듀스 기반 DFP-Tree를 이용한 클러스터링 알고리즘 (Clustering Algorithm using the DFP-Tree based on the MapReduce)

  • 서영원;김창수
    • 인터넷정보학회논문지
    • /
    • 제16권6호
    • /
    • pp.23-30
    • /
    • 2015
  • 빅 데이터가 이슈화됨에 따라 데이터 분석의 결과를 기반으로 동작하는 많은 응용들이연구되고 왔고, 대표적인 응용들은 전자상거래 시스템의 상품 추천 서비스, 검색 엔진에서의 검색 서비스, 소셜 네트워크 서비스에서의 친구 추천 서비스 등이 있다. 본 논문은 기존의 데이터 마이닝 기법 중 데이터 집합에서 나타나는 유사한 패턴들을 마이닝하는 빈발 패턴 트리와 컴퓨터 과학의 이론에 기초한 결정트리를 결합하여 결정 빈발 트리 알고리즘을 제안한다. 이는 기존의 빈발 패턴 트리 알고리즘은 패튼 트리에서 패턴 생성에 대한 정확성은 보장되나 소셜 데이터처럼 다양한 패턴이 나타는 데이터에 대해서는 많은 수의 패턴들을 생성시켜 분석에 대한 어려움이 있어, 서브트리들과의 수렴 여부를 판단하는 모델로 변형시켜 문제를 개선한다. 또한 맵리듀스로 모델링하여 분산처리를 통한 고속 처리 알고리즘을 제시한다.

Predicting Arab Consumers' Preferences on the Korean Contents Distribution

  • Park, Young-Eun;Chaffar, Soumaya;Kim, Myoung-Sook;Ko, Hye-Young
    • 유통과학연구
    • /
    • 제15권4호
    • /
    • pp.33-40
    • /
    • 2017
  • Purpose - This study aims to examine the analysis of pattern on Arab countries consumers' preferences of the Korean Contents using social media, Facebook since Korean entertainment contents have been distributed in the global marketplace. Then we focus on developing Predictive model using a Data Mining Technique. Research design, data and methodology - In order to understand preference growth of Korean contents in Arabic countries, we- collected data from two popular Facebook pages: 'Korean movies and drama' and 'K-pop'. Then, we adopted a data-driven approach based on Data Mining techniques. Results - It is obvious that the number of likes for K-pop will increase for all North African and Middle Eastern countries, however concerning Korean Movies and Drama except Tunisia it is decreasing for Algeria, Egypt and Morocco. Also, concerning Saudi Arabia and United Arab Emirates, the number of likes will decrease for Korean Movies and Drama which is not the case for Iraq. Conclusions - It is noted in this study that K-contents such as drama, movie and music are sometimes a gateway to a wider interest in Korean culture, food and brands. Moreover, this study gives significant implications for developing predictive model to forecast Korean contents' consumption and preferences.

Towards a Deep Analysis of High School Students' Outcomes

  • Barila, Adina;Danubianu, Mirela;Paraschiv, Andrei Marcel
    • International Journal of Computer Science & Network Security
    • /
    • 제21권6호
    • /
    • pp.71-76
    • /
    • 2021
  • Education is one of the pillars of sustainable development. For this reason, the discovery of useful information in its process of adaptation to new challenges is treated with care. This paper aims to present the initiation of a process of exploring the data collected from the results obtained by Romanian students at the BBaccalaureate (the Romanian high school graduation) exam, through data mining methods, in order to try an in-depth analysis to find and remedy some of the causes that lead to unsatisfactory results. Specifically, a set of public data was collected from the website of the Ministry of Education, on which several classification methods were tested in order to find the most efficient modeling algorithm. It is the first time that this type of data is subjected to such interests.

RHadoop 플랫폼기반 CAWFP-Tree를 이용한 적응 빈발 패턴 알고리즘 (Adaptive Frequent Pattern Algorithm using CAWFP-Tree based on RHadoop Platform)

  • 박인규
    • 디지털융복합연구
    • /
    • 제15권6호
    • /
    • pp.229-236
    • /
    • 2017
  • 효율적인 빈발 패턴 알고리즘은 연관 규칙 마이닝이나 융복합을 위한 마이닝 과정에서 필수적인 요소이며 많은 활용성을 가지고 있다. 패턴 마이닝을 위한 많은 모델들이 빈발 패턴에 관한 정보를 추출하여 FP-트리를 이용하여 저장하고 있다. 본 논문에서는 항목들의 무게중심을 이용한 새로운 빈발 패턴 알고리즘(CAWFP-Growth)을 제안하여 항목들이 가지는 가중치와 빈도수를 같이 고려하여 항목간의 중심을 계산하여 기존의 FP-Growth 알고리즘의 효율성을 향상시킨다. 제안한 방법은 하향 폐쇄의 성질을 유지하기 위한 기존의 전역적 최대치 가중치 지지도를 필요로 하지 않기 때문에 자연히 빈발 패턴의 탐색시간이 줄어들고 정보의 손실을 줄일 수 있다. 실험결과를 통하여 제안된 알고리즘이 기존의 동적가중치를 이용하는 다른 방법과 비교해볼 때, 항목들의 무게중심이 빈발패턴의 정확한 정보를 유지하고 FP-트리의 처리시간을 줄여주기 때문에 제안한 방법의 중요성을 보이고 있다 또한 가상 분산모드에서 맵리듀스 프레임을 기반으로 빅데이터를 모델링하고 향후 완전분산 모드에서 제안한 알고리즘의 모델링이 필요하다.

Performance Optimization of Big Data Center Processing System - Big Data Analysis Algorithm Based on Location Awareness

  • Zhao, Wen-Xuan;Min, Byung-Won
    • International Journal of Contents
    • /
    • 제17권3호
    • /
    • pp.74-83
    • /
    • 2021
  • A location-aware algorithm is proposed in this study to optimize the system performance of distributed systems for processing big data with low data reliability and application performance. Compared with previous algorithms, the location-aware data block placement algorithm uses data block placement and node data recovery strategies to improve data application performance and reliability. Simulation and actual cluster tests showed that the location-aware placement algorithm proposed in this study could greatly improve data reliability and shorten the application processing time of I/O interfaces in real-time.

Big Numeric Data Classification Using Grid-based Bayesian Inference in the MapReduce Framework

  • Kim, Young Joon;Lee, Keon Myung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제14권4호
    • /
    • pp.313-321
    • /
    • 2014
  • In the current era of data-intensive services, the handling of big data is a crucial issue that affects almost every discipline and industry. In this study, we propose a classification method for large volumes of numeric data, which is implemented in a distributed programming framework, i.e., MapReduce. The proposed method partitions the data space into a grid structure and it then models the probability distributions of classes for grid cells by collecting sufficient statistics using distributed MapReduce tasks. The class labeling of new data is achieved by k-nearest neighbor classification based on Bayesian inference.

Dynamic Elasticities Between Financial Performance and Determinants of Mining and Extractive Companies in Jordan

  • Yusop, Nora Yusma;Alhyari, Jad Alkareem;Bekhet, Hussain Ali
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제8권7호
    • /
    • pp.433-446
    • /
    • 2021
  • This study aims to identify the elasticities and casualties of financial performance and determinants of the mining and extractive companies listed in Jordan's stock market over the 2005-2018 period. The conceptual framework is based on the Resource-Based View theory and Arbitrage Pricing theory is used to describe the relationship between the external environment and the financial performance of the companies. Profitability ratio (return on assets) is utilized as a proxy of financial performance measurement. Meantime, the company's characteristics, macroeconomic variables, and non-economic factors are utilized as independent factors. Data sources are panel data set for mining and extractive companies over the above period. Fully Modified Ordinary Least Square (FMOLS), Dynamic Ordinary Least Squares (DOLS), and Pooled Mean Group (PMG) methods are applied. The empirical findings indicated that company size, sales growth, financial leverage, liquidity, and GDP growth were the critical determinants of mining and extractive companies' financial performance in the Amman Stock Exchange. Thus, the findings conclude that company characteristics and GDP growth mainly drive financial performance. Moreover, the findings reveal that a bidirectional causal elasticity exists between GDP and financial leverage and return on assets (ROA). Sound financial performance can be obtained by paying more attention to GDP growth and firms' characteristics.

Research on Security Threats Emerging from Blockchain-based Services

  • Yoo, Soonduck
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제13권4호
    • /
    • pp.1-10
    • /
    • 2021
  • The purpose of the study is to contribute to the positive development of blockchain technology by providing data to examine security vulnerabilities and threats to blockchain-based services and review countermeasures. The findings of this study are as follows. Threats to the security of blockchain-based services can be classified into application security threats, smart contract security threats, and network (P2P) security threats. First, application security threats include wallet theft (e-wallet stealing), double spending (double payment attack), and cryptojacking (mining malware infection). Second, smart contract security threats are divided into reentrancy attacks, replay attacks, and balance increasing attacks. Third, network (P2P) security threats are divided into the 51% control attack, Sybil attack, balance attack, eclipse attack (spread false information attack), selfish mining (selfish mining monopoly), block withholding attack, DDoS attack (distributed service denial attack) and DNS/BGP hijacks. Through this study, it is possible to discuss the future plans of the blockchain technology-based ecosystem through understanding the functional characteristics of transparency or some privacy that can be obtained within the blockchain. It also supports effective coping with various security threats.

The study of a full cycle semi-automated business process re-engineering: A comprehensive framework

  • Lee, Sanghwa;Sutrisnowati, Riska A.;Won, Seokrae;Woo, Jong Seong;Bae, Hyerim
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권11호
    • /
    • pp.103-109
    • /
    • 2018
  • This paper presents an idea and framework to automate a full cycle business process management and re-engineering by integrating traditional business process management systems, process mining, data mining, machine learning, and simulation. We build our framework on the cloud-based platform such that various data sources can be incorporated. We design our systems to be extensible so that not only beneficial for practitioners of BPM, but also for researchers. Our framework can be used as a test bed for researchers without the complication of system integration. The automation of redesigning phase and selecting a baseline process model for deployment are the two main contributions of this study. In the redesigning phase, we deal with both the analysis of the existing process model and what-if analysis on how to improve the process at the same time, Additionally, improving a business process can be applied in a case by case basis that needs a lot of trial and error and huge data. In selecting the baseline process model, we need to compare many probable routes of business execution and calculate the most efficient one in respect to production cost and execution time. We also discuss the challenges and limitation of the framework, including the systems adoptability, technical difficulties and human factors.

빅데이터 분석을 통한 농촌관광 실태와 활성화 방안 연구: 전라북도를 중심으로 (Study of the Activation Plan for Rural Tourism of the Jeollabuk-do Using Big Data Analysis)

  • 박로운;이기훈
    • 한국지역사회생활과학회지
    • /
    • 제27권spc호
    • /
    • pp.665-679
    • /
    • 2016
  • This study examined the main factors for activating rural tourism of Jeollabuk-do using big data analysis. The tourism big data was gathered from public open data sources and social network services (SNS), and the analysis tools, 'Opinion Mining', 'Text Mining', and 'Social Network Analysis(SNA)' were used. The opinion mining and text mining analysis identified the key local contents of the 14 areas of Jeollabuk-do and the evaluations of customers on rural tourism. Social network analysis detected the relationships between their contents and determined the importance of the contents. The results of this research showed that each location in Jeollabuk-do had their specific contents attracting visitors and the number of contents affected the scale of tourists. In addition, the number of visitors might be large when their tourism contents were strongly correlated with the other contents. Hence, strong connections among their contents are a point to activate rural tourism. Social network analysis divided the contents into several clusters and derived the eigenvector centralities of the content nodes implying the importance of them in the network. Tourism was active when the nodes at high value of the eigenvector centrality were distributed evenly in every cluster; however the results were contrary when the nodes were located in a few clusters. This study suggests an action plan to extend rural tourism that develop valuable contents and connect the content clusters properly.