• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.028 seconds

Machine Learning Technology Trends for Big Data Processing (빅데이터 활용을 위한 기계학습 기술동향)

  • Lim, S.J.;Min, O.K.
    • Electronics and Telecommunications Trends
    • /
    • v.27 no.5
    • /
    • pp.55-63
    • /
    • 2012
  • 빅데이터 시대를 맞이하여 이를 분석하여 지능형 서비스로 활용할 수 있는 기술로 인공지능 기술이 다시 관심을 받고 있다. 본고에서는 인공지능의 여러 요소 기술 중 기계학습(machine learning) 분야의 빅데이터 처리를 위한 동향을 소개한다. 현재 사용 가능한 병렬처리 기반의 기계학습, 빅데이터를 이용한 기계학습 기반으로 진행되고 있는 프로젝트, 다양한 분야에 쉽게 기계학습을 적용할 수 있는 domain adaptation 기술에 대해서 정리한다.

  • PDF

Design for Haddop-based Platform to Improve Io T-based Big Data Processing Efficiency (IoT 기반 빅데이터 효율성 향상을 위한 하둡기반 플랫폼 설계)

  • Jang, Kyungsung;Bae, Sang Hyun
    • Journal of Integrative Natural Science
    • /
    • v.13 no.3
    • /
    • pp.114-119
    • /
    • 2020
  • IoT 및 사물인터넷 기반 빅데이터 시스템을 구축하는 경우 발생하는 빈번한 전송에 따른 데이터 오류율과 자원의 비효율적 이용율을 극복하기 위하고 오픈소스기반 하둡시스템의 문제점을 극복하기 위한 본 연구에서는 순수 하둡을 기반으로 적용된 결과를 분석하고 하둡 2.x대 버전을 기준으로 빅데이터 시스템의 용량을 산정한 가이드를 제시하고 용량 산정의 기준을 에코 소프트웨어 적용 플랫폼을 제안한다.

Medical image control process improvement based on Cardiac PACS (Cardiac PACS 구축에 따른 의료영상 관리 프로세스 개선)

  • Jung, Young-Tae
    • Korean Journal of Digital Imaging in Medicine
    • /
    • v.16 no.1
    • /
    • pp.35-42
    • /
    • 2014
  • Heart related special images are classified as Cardiac US, XA, CT, MRI. Several Problem is caused by image compression, control and medical support point, so most big hospitals have created a Cadiac PACS differentially in past years. For this reason, create a conflict in inner colleague and patient, protector that result from 2 data processing server operating independently in 1 medical center area. For this reason, we sugges an alternative model of best medical control process together with understand the current situation on medical facility.

  • PDF

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

UB-IOT Modeling for Pattern Analysis of the Real-Time Biological Data (실시간 생체 데이터의 패턴분석을 위한 UB-IOT 모델링)

  • Shin, Yoon Hwan;Shin, Ye Ho;Park, Hyun Woo;Ryu, Keun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.2
    • /
    • pp.95-106
    • /
    • 2016
  • Biometric data may appear different depending on the person and sasang Medicine has a close relationship with the Department. Biometric data not only mean a human heart rate, a blood pressure, a heart rate, and the past medical history, degree of aging, body mass index, but also is used as a reference measure for determining the state of health of the person. So biometric data should be reproduced for the application purposes, depending on their applications. In previous studies, because the biometric data is changed in real time and applies only to snap shut at the time of the continuity of the current time is excluded. Therefore, in this study in order to solve this problem, we propose a biometric data patton analysis model comprising a continuity of time in the big data environment consisting of biometric data. The proposed model can help determine the choice of needle position carefully when using the electronic acupuncture for care and health promotion.

Hadoop and MapReduce (하둡과 맵리듀스)

  • Park, Jeong-Hyeok;Lee, Sang-Yeol;Kang, Da Hyun;Won, Joong-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.5
    • /
    • pp.1013-1027
    • /
    • 2013
  • As the need for large-scale data analysis is rapidly increasing, Hadoop, or the platform that realizes large-scale data processing, and MapReduce, or the internal computational model of Hadoop, are receiving great attention. This paper reviews the basic concepts of Hadoop and MapReduce necessary for data analysts who are familiar with statistical programming, through examples that combine the R programming language and Hadoop.

Efficient Multimedia Data File Management and Retrieval Strategy on Big Data Processing System

  • Lee, Jae-Kyung;Shin, Su-Mi;Kim, Kyung-Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.8
    • /
    • pp.77-83
    • /
    • 2015
  • The storage and retrieval of multimedia data is becoming increasingly important in many application areas including record management, video(CCTV) management and Internet of Things (IoT). In these applications, the files containing multimedia that need to be stored and managed is tremendous and constantly scaling. In this paper, we propose a technique to retrieve a very large number of files, in multimedia format, using the Hadoop Framework. Our strategy is based on the management of metadata that describes the characteristic of files that are stored in Hadoop Distributed File System (HDFS). The metadata schema is represented in Hbase and looked up using SQL On Hadoop (Hive, Tajo). Both the Hbase, Hive and Tajo are part of the Hadoop Ecosystem. Preliminary experiment on multimedia data files stored in HDFS shows the viability of the proposed strategy.

Data analysis of 4M data in small and medium enterprises (빅데이터 도입을 위한 중소제조공정 4M 데이터 분석)

  • Kim, Jae Sung;Cho, Wan Sup
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1117-1128
    • /
    • 2015
  • In order to secure an important competitive advantage in manufacturing business, an automation and information system from manufacturing process has been introduced; however, small and medium enterprises have not met the power of information in the manufacturing fields. They have been managing the manufacturing process that is depending on the operator's experience and data written by hand, which has limits to reveal cause of defective goods clearly, in the case of happening of low-grade goods. In this study, we analyze critical factors which affect the quality of some manufacturing process in terms of 4M. We also studied the automobile parts processing of the small and medium manufacturing enterprises controlled with data written by hand so as to collect the data written by hand and to utilize sensor data in the future. Analysis results show that there is no deference in defective quantity in machines, while raw materials, production quality and task tracking have significant deference.

A study on Data Context-Based Risk Measurement Method for Pseudonymized Information Processing

  • Kim, Dong-Hyun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.6
    • /
    • pp.53-63
    • /
    • 2022
  • Recently, as digital transformation due to the COVID-19 pandemic accelerates, data to improve individual quality of life is being used in large quantities, and more reinforced non-identification processing procedures are required to utilize the most valuable personal information among data. In Korea, procedures for de-identification measures are presented through amendments to laws and guidelines, but there is no methodology to measure the level of de-identification in the field due to ambiguous processing standards and subjective risk measurement methods. This paper compares and analyzes the current status of policy and guidelines related to de-identification measures proposed at home and abroad to derive complementary points, suggests a data context-based risk measurement method centered on pseudonymized information processing, and verifies its validity. As a result of verification through Delphi survey and focus group interview (FGI), it was confirmed that the need for the proposed methodology and the validity of the indicators were high.

A Study on Efficient Memory Management Using Machine Learning Algorithm

  • Park, Beom-Joo;Kang, Min-Soo;Lee, Minho;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.6 no.1
    • /
    • pp.39-43
    • /
    • 2017
  • As the industry grows, the amount of data grows exponentially, and data analysis using these serves as a predictable solution. As data size increases and processing speed increases, it has begun to be applied to new fields by combining artificial intelligence technology as well as simple big data analysis. In this paper, we propose a method to quickly apply a machine learning based algorithm through efficient resource allocation. The proposed algorithm allocates memory for each attribute. Learning Distinct of Attribute and allocating the right memory. In order to compare the performance of the proposed algorithm, we compared it with the existing K-means algorithm. As a result of measuring the execution time, the speed was improved.