• Title/Summary/Keyword: Large data

Search Result 14,025, Processing Time 0.035 seconds

Gene Algorithm of Crowd System of Data Mining

  • Park, Jong-Min
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.1
    • /
    • pp.40-44
    • /
    • 2012
  • Data mining, which is attracting public attention, is a process of drawing out knowledge from a large mass of data. The key technique in data mining is the ability to maximize the similarity in a group and minimize the similarity between groups. Since grouping in data mining deals with a large mass of data, it lessens the amount of time spent with the source data, and grouping techniques that shrink the quantity of the data form to which the algorithm is subjected are actively used. The current grouping algorithm is highly sensitive to static and reacts to local minima. The number of groups has to be stated depending on the initialization value. In this paper we propose a gene algorithm that automatically decides on the number of grouping algorithms. We will try to find the optimal group of the fittest function, and finally apply it to a data mining problem that deals with a large mass of data.

A Method for Analyzing Web Log of the Hadoop System for Analyzing a Effective Pattern of Web Users (효과적인 웹 사용자의 패턴 분석을 위한 하둡 시스템의 웹 로그 분석 방안)

  • Lee, Byungju;Kwon, Jungsook;Go, Gicheol;Choi, Yonglak
    • Journal of Information Technology Services
    • /
    • v.13 no.4
    • /
    • pp.231-243
    • /
    • 2014
  • Of the various data that corporations can approach, web log data are important data that correspond to data analysis to implement customer relations management strategies. As the volume of approachable data has increased exponentially due to the Internet and popularization of smart phone, web log data have also increased a lot. As a result, it has become difficult to expand storage to process large amounts of web logs data flexibly and extremely hard to implement a system capable of categorizing, analyzing, and processing web log data accumulated over a long period of time. This study thus set out to apply Hadoop, a distributed processing system that had recently come into the spotlight for its capacity of processing large volumes of data, and propose an efficient analysis plan for large amounts of web log. The study checked the forms of web log by the effective web log collection methods and the web log levels by using Hadoop and proposed analysis techniques and Hadoop organization designs accordingly. The present study resolved the difficulty with processing large amounts of web log data and proposed the activity patterns of users through web log analysis, thus demonstrating its advantages as a new means of marketing.

An Efficient Large Graph Clustering Technique based on Min-Hash (Min-Hash를 이용한 효율적인 대용량 그래프 클러스터링 기법)

  • Lee, Seok-Joo;Min, Jun-Ki
    • Journal of KIISE
    • /
    • v.43 no.3
    • /
    • pp.380-388
    • /
    • 2016
  • Graph clustering is widely used to analyze a graph and identify the properties of a graph by generating clusters consisting of similar vertices. Recently, large graph data is generated in diverse applications such as Social Network Services (SNS), the World Wide Web (WWW), and telephone networks. Therefore, the importance of graph clustering algorithms that process large graph data efficiently becomes increased. In this paper, we propose an effective clustering algorithm which generates clusters for large graph data efficiently. Our proposed algorithm effectively estimates similarities between clusters in graph data using Min-Hash and constructs clusters according to the computed similarities. In our experiment with real-world data sets, we demonstrate the efficiency of our proposed algorithm by comparing with existing algorithms.

A Simple and Fast Web Alignment Tool for Large Amount of Sequence Data

  • Lee, Yong-Seok;Oh, Jeong-Su
    • Genomics & Informatics
    • /
    • v.6 no.3
    • /
    • pp.157-159
    • /
    • 2008
  • Multiple sequence alignment (MSA) is the most important step for many of biological sequence analyses, homology search, and protein structural assignments. However, large amount of data make biologists difficult to perform MSA analyses and it requires much computational time to align many sequences. Here, we have developed a simple and fast web alignment tool for aligning, editing, and visualizing large amount of sequence data. We used a cluster server installed ClustalW-MPI using web services and message passing interface (MPI). It also enables users to edit multiple sequence alignments for manual editing and to download the input data and results such as alignments and phylogenetic tree.

Privacy Enhanced Data Security Mechanism in a Large-Scale Distributed Computing System for HTC and MTC

  • Rho, Seungwoo;Park, Sangbae;Hwang, Soonwook
    • International Journal of Contents
    • /
    • v.12 no.2
    • /
    • pp.6-11
    • /
    • 2016
  • We developed a pilot-job based large-scale distributed computing system to support HTC and MTC, called HTCaaS (High-Throughput Computing as a Service), which helps scientists solve large-scale scientific problems in areas such as pharmaceutical domains, high-energy physics, nuclear physics and bio science. Since most of these problems involve critical data that affect the national economy and activate basic industries, data privacy is a very important issue. In this paper, we implement a privacy enhanced data security mechanism to support HTC and MTC in a large-scale distributed computing system and show how this technique affects performance in our system. With this mechanism, users can securely store data in our system.

The Negative Impact Study on the Information of the Large Discount Retailers

  • Kim, Jong-Jin
    • Journal of Distribution Science
    • /
    • v.13 no.7
    • /
    • pp.33-40
    • /
    • 2015
  • Purpose - This study aims to find out what impacts large retailers' behaviors appearing when they promote the strengthening of their market dominating power in the trade relations with small and medium suppliers or in the market can have on consumers. Research design, data, methodology - This study analyzed negative information (news) on large retailers (Lotte Mart, E-Mart and Homeplus) based on the monthly data over the past five years from 2008 to 2012 and also analyzed the correlation between dependent variables that are likely to affect sales through large retailer economic index, Results - This study conducted a correlation analysis on the time lag of the factors that have an impact on the negative information and sales of large retailers in order to analyze how consumers respond to the choice of large retailers' store (store sales) when they perceived negative information about the un- ethical behaviors of large retailers. Conclusions - Unfair and negative information on large retailers appeared significant for the hypothesis that sales will be affected by the image of large retailers and change of consumer attitudes.

A design and implementation of transmit/receive model to speed up the transmission of large string-data sets in TCP/IP socket communication (TCP/IP 소켓통신에서 대용량 스트링 데이터의 전송 속도를 높이기 위한 송수신 모델 설계 및 구현)

  • Kang, Dong-Jo;Park, Hyun-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.4
    • /
    • pp.885-892
    • /
    • 2013
  • In the model Utilizing the TCP / IP socket communication to transmit and receive data, if the size of data is small and if data-transmission aren't frequently requested, the importance of communication speed between a server and a client isn't emphasized. But nowadays, it has emerged for large amounts of data transfer requests and frequent data transfer request. This paper propose the TCP/IP communication model that can be improved the data transfer rate in multi-core environment by changing the receiving structure of the client to receive large amounts of data and the transmission structure of the server to send large amounts of data.

Asymptotics in Load-Balanced Tandem Networks

  • Lee, Ji-Yeon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.3
    • /
    • pp.715-723
    • /
    • 2003
  • A tandem network in which all nodes have the same load is considered. We derive bounds on the probability that the total population of the tandem network exceeds a large value by using its relation to the stationary distribution. These bounds imply a stronger asymptotic limit than that in the large deviation theory.

  • PDF

The Effects of Trading-Hour Regulations on Large Stores in Korea

  • Kim, Woohyoung;Lee, Hahn-Shik
    • Journal of Distribution Science
    • /
    • v.15 no.8
    • /
    • pp.5-14
    • /
    • 2017
  • Purpose - This study empirically analyses the sale changes in large retail stores directly resulting from increased controls on those stores. More specifically, we discuss the economic impacts on Korean regulations that restrict trading hours and mandate statutory store closure 'holidays' twice per month. Research design, data and methodology - we attempt to empirically analyse the economic effects of trading hours regulations through quantitative analysis of the sales revenue data of large retail stores. We introduce the data and methods of empirical analysis used to analyse the economic effects of trading-hour regulations on large retail stores. We use a panel regression to analyse the sales losses of large retail stores caused by the new constraints on business hours. Results - The results of this study show that the sales of large retail stores fell by the average of 3.4% per month during the regulation periods. However, regulations affecting large retail stores have various economic impacts, including variations in sales, changes in consumption patterns, and influences on consumer welfare and national economy. Conclusions - Such changes may also be captured by other metrics: accordingly, further researches are needed to measure the impact of regulations on economic indicators such as employment and GDP.

Effective Generation of Minimal Perfect hash Functions for Information retrival from large Sets of Data (대규모의 정보 검색을 위한 효율적인 최소 완전 해시함수의 생성)

  • Kim, Su-Hee;Park. Se-Young
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.9
    • /
    • pp.2256-2270
    • /
    • 1998
  • The development of a high perfoffilance index system is crucial for the retrieval of information from large sets o[ data. In this study, a minimal perfect hash function (MPHF), which hashes m keys to m buckets with no collisions, is revisited. The MOS algorithm developed bv Heath is modified to be successful for computing MPHFs of large sets of keys Also, a system for generating MPHFs for large sets of keys is developed. This system computed MPHFs for several large sets of data more efficiently than Heath's. The application areas for this system include those for generating MPHFs for the indexing of large and infrequently changing sets of data as well as information stored in a medium whose seek time is very slow.

  • PDF