• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.03 seconds

Image Machine Learning System using Apache Spark and OpenCV on Distributed Cluster (Apache Spark와 OpenCV를 활용한 분산 클러스터 컴퓨팅 환경 대용량 이미지 머신러닝 시스템)

  • Hayoon Kim;Wonjib Kim;Hyeopgeon Lee;Young Woon Kim
    • Annual Conference of KIPS
    • /
    • 2023.05a
    • /
    • pp.33-34
    • /
    • 2023
  • 성장하는 빅 데이터 시장과 빅 데이터 수의 기하급수적인 증가는 기존 컴퓨팅 환경에서 데이터 처리의 어려움을 야기한다. 특히 이미지 데이터 처리 속도는 데이터양이 많을수록 현저하게 느려진다. 이에 본 논문에서는 Apache Spark와 OpenCV를 활용한 분산 클러스터 컴퓨팅 환경의 대용량 이미지 머신러닝 시스템을 제안한다. 제안하는 시스템은 Apache Spark를 통해 분산 클러스터를 구성하며, OpenCV의 이미지 처리 알고리즘과 Spark MLlib의 머신러닝 알고리즘을 활용하여 작업을 수행한다. 제안하는 시스템을 통해 본 논문은 대용량 이미지 데이터 처리 및 머신러닝 작업 속도 향상 방법을 제시한다.

PSS Evaluation Based on Vague Assessment Big Data: Hybrid Model of Multi-Weight Combination and Improved TOPSIS by Relative Entropy

  • Lianhui Li
    • Journal of Information Processing Systems
    • /
    • v.20 no.3
    • /
    • pp.285-295
    • /
    • 2024
  • Driven by the vague assessment big data, a product service system (PSS) evaluation method is developed based on a hybrid model of multi-weight combination and improved TOPSIS by relative entropy. The index values of PSS alternatives are solved by the integration of the stakeholders' vague assessment comments presented in the form of trapezoidal fuzzy numbers. Multi-weight combination method is proposed for index weight solving of PSS evaluation decision-making. An improved TOPSIS by relative entropy (RE) is presented to overcome the shortcomings of traditional TOPSIS and related modified TOPSIS and then PSS alternatives are evaluated. A PSS evaluation case in a printer company is given to test and verify the proposed model. The RE closeness of seven PSS alternatives are 0.3940, 0.5147, 0.7913, 0.3719, 0.2403, 0.4959, and 0.6332 and the one with the highest RE closeness is selected as the best alternative. The results of comparison examples show that the presented model can compensate for the shortcomings of existing traditional methods.

Subnet Selection Scheme based on probability to enhance process speed of Big Data (빅 데이터의 처리속도 향상을 위한 확률기반 서브넷 선택 기법)

  • Jeong, Yoon-Su;Kim, Yong-Tae;Park, Gil-Cheol
    • Journal of Digital Convergence
    • /
    • v.13 no.9
    • /
    • pp.201-208
    • /
    • 2015
  • With services such as SNS and facebook, Big Data popularize the use of small size such as micro blogs are increasing. However, the problem of accuracy and computational cost of the search result of big data of a small size is unresolved. In this paper, we propose a subnet selection techniques based probability to improve the browsing speed of the small size of the text information from big data environments, such as micro-blogs. The proposed method is to configure the subnets to give to the attribute information of the data increased the probability data search speed. In addition, the proposed method improves the accessibility of the data by processing a pair of the connection information between the probability of the data constituting the subnet to easily access the distributed data. Experimental results showed the proposed method is 6.8% higher detection rates than CELF algorithm, the average processing time was reduced by 8.2%.

Big Data Activation Plan for Digital Transformation of Agriculture and Rural (농업·농촌 디지털 전환을 위한 빅데이터 활성화 방안 연구)

  • Lee, Won Suk;Son, Kyungja;Jun, Daeho;Shin, Yongtae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.8
    • /
    • pp.235-242
    • /
    • 2020
  • In order to promote digital transformation of our agricultural and rural communities in the wake of the fourth industrial revolution and prepare for the upcoming artificial intelligence era, it is necessary to establish a system and system that can collect, analyze and utilize necessary quality data. To this end, we will investigate and analyze problems and issues felt by various stakeholders such as farmers and agricultural officials, and present strategic measures to revitalize big data, which must be decided in order to promote digital transformation of our agricultural and rural communities, such as expanding big data platforms for joint utilization, establishing sustainable big data governance, and revitalizing the foundation for big data utilization based on demand.

A Study on the Development of the Key Promoting Talent in the 4th Industrial Revolution - Utilizing Six Sigma MBB competency-

  • Kim, Kang Hee;Ree, Sang bok
    • Journal of Korean Society for Quality Management
    • /
    • v.45 no.4
    • /
    • pp.677-696
    • /
    • 2017
  • Purpose: This study suggests that Six Sigma MBB should be used as a key talent to lead the fourth industrial revolution era by training them with big data processing capability. Methods: Through the analysis between articles on the fourth industrial revolution and Six Sigma related papers, common competencies of data scientists and Six Sigma MBBs were identified and the big data analysis capabilities needed for Six Sigma MBB were derived. Then, training was conducted to improve the big data analysis capabilities so that Six Sigma MBB is able to design algorithms required in the fourth industrial revolution era. Results: Six Sigma MBBs, equipped with the knowledge in field site improvement and basic statistics, were provided with 40 hours of big data analysis training and then were made to design a big data algorithm. Positive results were obtained after applying a AI algorithm which could forecast process defects in a field site. Conclusion: Six Sigma MBB equipped with big data capability will make the best talent for the fourth industrial revolution era. A Six Sigma MBB has an excellent capability for improving field sites. Utilizing the competencies of MBB can be a key to success in the fourth industrial revolution. We hope that the results of this study will be shared with many companies and many more improved case studies will arise in the future as a result of this study.

Standardizing Unstructured Big Data and Visual Interpretation using MapReduce and Correspondence Analysis (맵리듀스와 대응분석을 활용한 비정형 빅 데이터의 정형화와 시각적 해석)

  • Choi, Joseph;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.169-183
    • /
    • 2014
  • Massive and various types of data recorded everywhere are called big data. Therefore, it is important to analyze big data and to nd valuable information. Besides, to standardize unstructured big data is important for the application of statistical methods. In this paper, we will show how to standardize unstructured big data using MapReduce which is a distribution processing system. We also apply simple correspondence analysis and multiple correspondence analysis to nd the relationship and characteristic of direct relationship words for Samsung Electronics and The Korea Economic Daily newspaper as well as Apple Inc.

Bio-Sensing Convergence Big Data Computing Architecture (바이오센싱 융합 빅데이터 컴퓨팅 아키텍처)

  • Ko, Myung-Sook;Lee, Tae-Gyu
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.2
    • /
    • pp.43-50
    • /
    • 2018
  • Biometric information computing is greatly influencing both a computing system and Big-data system based on the bio-information system that combines bio-signal sensors and bio-information processing. Unlike conventional data formats such as text, images, and videos, biometric information is represented by text-based values that give meaning to a bio-signal, important event moments are stored in an image format, a complex data format such as a video format is constructed for data prediction and analysis through time series analysis. Such a complex data structure may be separately requested by text, image, video format depending on characteristics of data required by individual biometric information application services, or may request complex data formats simultaneously depending on the situation. Since previous bio-information processing computing systems depend on conventional computing component, computing structure, and data processing method, they have many inefficiencies in terms of data processing performance, transmission capability, storage efficiency, and system safety. In this study, we propose an improved biosensing converged big data computing architecture to build a platform that supports biometric information processing computing effectively. The proposed architecture effectively supports data storage and transmission efficiency, computing performance, and system stability. And, it can lay the foundation for system implementation and biometric information service optimization optimized for future biometric information computing.

Visualizing Article Material using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 논문 데이터 시각화)

  • Nam, Soo-Tai;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.326-327
    • /
    • 2021
  • Newly, big data utilization has been widely interested in a wide variety of industrial fields. Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study were analyzed for 29 papers in a specific journal. In the final analysis results, the most frequently mentioned keyword was "Research", which ranked first 743 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

Scaling of Hadoop Cluster for Cost-Effective Processing of MapReduce Applications (비용 효율적 맵리듀스 처리를 위한 클러스터 규모 설정)

  • Ryu, Woo-Seok
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.15 no.1
    • /
    • pp.107-114
    • /
    • 2020
  • This paper studies a method for estimating the scale of a Hadoop cluster to process big data as a cost-effective manner. In the case of medical institutions, demands for cloud-based big data analysis are increasing as medical records can be stored outside the hospital. This paper first analyze the Amazon EMR framework, which is one of the popular cloud-based big data framework. Then, this paper presents a efficiency model for scaling the Hadoop cluster to execute a Mapreduce application more cost-effectively. This paper also analyzes the factors that influence the execution of the Mapreduce application by performing several experiments under various conditions. The cost efficiency of the analysis of the big data can be increased by setting the scale of cluster with the most efficient processing time compared to the operational cost.

Patent Document Classification by Using Hierarchical Attention Network (계층적 주의 네트워크를 활용한 특허 문서 분류)

  • Jang, Hyuncheol;Han, Donghee;Ryu, Teaseon;Jang, Hyungkuk;Lim, HeuiSeok
    • Annual Conference of KIPS
    • /
    • 2018.05a
    • /
    • pp.369-372
    • /
    • 2018
  • 최근 지식경영에 있어 특허를 통한 지식재산권 확보는 기업 운영에 큰 영향을 주는 요소이다. 성공적인 특허 확보를 위해서, 먼저 변화하는 특허 분류 제계를 이해하고, 방대한 특허 정보 데이터를 빠르고 신속하게 특허 분류 체계에 따라 분류화 시킬 필요가 있다. 본 연구에서는 머신 러닝 기술 중에서도 계층적 주의 네트워크를 활용하여 특허 자료의 초록을 학습시켜 분류를 할 수 있는 방법을 제안한다. 그리고 본 연구에서는 제안된 계층적 주의 네트워크의 성능을 검증하기 위해 수정된 입력데이터와 다른 워드 임베딩을 활용하여 진행하였다. 이를 통하여 특허 문서 분류에 활용하려는 계층적 주의 네트워크의 성능과 특허 문서 분류 활용화 방안을 보여주고자 한다. 본 연구의 결과는 많은 기업 지식경영에서 실용적으로 활용할 수 있도록 지식경영 연구자, 기업의 관리자 및 실무자에게 유용한 특허분류기법에 관한 이론적 실무적 활용 방안을 제시한다.