• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.024 seconds

A Study on Interdisciplinary Structure of Big Data Research with Journal-Level Bibliographic-Coupling Analysis (학술지 단위 서지결합분석을 통한 빅데이터 연구분야의 학제적 구조에 관한 연구)

  • Lee, Boram;Chung, EunKyung
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.133-154
    • /
    • 2016
  • Interdisciplinary approach has been recognized as one of key strategies to address various and complex research problems in modern science. The purpose of this study is to investigate the interdisciplinary characteristics and structure of the field of big data. Among the 1,083 journals related to the field of big data, multiple Subject Categories (SC) from the Web of Science were assigned to 420 journals (38.8%) and 239 journals (22.1%) were assigned with the SCs from different fields. These results show that the field of big data indicates the characteristics of interdisciplinarity. In addition, through bibliographic coupling network analysis of top 56 journals, 10 clusters in the network were recognized. Among the 10 clusters, 7 clusters were from computer science field focusing on technical aspects such as storing, processing and analyzing the data. The results of cluster analysis also identified multiple research works of analyzing and utilizing big data in various fields such as science & technology, engineering, communication, law, geography, bio-engineering and etc. Finally, with measuring three types of centrality (betweenness centrality, nearest centrality, triangle betweenness centrality) of journals, computer science journals appeared to have strong impact and subjective relations to other fields in the network.

Improving Legislation on the use of Healthcare Data for Research Purposes (보건의료 빅데이터의 연구목적 사용에 대한 법제 개선방안)

  • Park, Dae Woong;Jeong, Hyun Hak;Jeong, Myung Jin;Ryoo, Hwa Shin
    • The Korean Society of Law and Medicine
    • /
    • v.17 no.2
    • /
    • pp.315-346
    • /
    • 2016
  • With the development of big data processing technology, the potential value of healthcare big data has attracted much attention. In order to realize these potential values, various research using the healthcare big data are essential. However, the big data regulatory system centered on the Personal Information Protection Act does not take into account the aspect of big data as an economic material and causes many obstacles to utilize it as a research purpose. The regulatory system of healthcare information, centered on the primary purpose of patient treatment, should be improved in a way that is compatible with the development of technology and easy to use for public interest. To this end, it is necessary to examine the trends of overseas legal system reflecting the concerns about the balance of protection and utilization of personal information. Based on the implications of the overseas legal system, we can derive improvement points in the following directions from our legal system. First, a legal system that specializes in healthcare information and encompasses protection and utilization is needed. De-identification, which is an exception to the Privacy Act, should also clearly define its level. It is necessary to establish a legal basis for linking healthcare big data to create synergy effects in research. It is also necessary to examine the introduction of the opt-out system on the basis of the discussion on the foreign debate and social consensus. But most importantly, it is the people's trust in these systems.

  • PDF

Big data for Speech and Language Processing (빅데이터 기반 음성언어 처리 기술)

  • Na, S.H.;Jung, H.Y.;Yang, S.I.;Kim, C.H.;Kim, Y.K.
    • Electronics and Telecommunications Trends
    • /
    • v.28 no.1
    • /
    • pp.52-61
    • /
    • 2013
  • 음성언어 처리 분야는 인간의 자연어 발화를 컴퓨터가 자동으로 이해하고 처리하는 알고리즘을 연구하는 분야로, 자동 통번역, Siri와 같은 음성 대화 시스템, 차세대 인터페이스, 질의 응답 시스템 등 다양한 응용군을 포함한다. 특히, 음성언어 처리 기술은, 최근 빅데이터(big data) 시대를 맞이하여, 방대한 음성/텍스트 정보를 처리하기 위한 필수 기술로 각광받고 있다. 한편, 빅데이터는 그 자체가 거대한 말뭉치 데이터로서 음성언어 처리 기술의 성능을 향상시키는 주된 리소스가 된다. 이에 따라, 최근 빅데이터를 이용하여 음성언어 처리 기술의 성능을 개선시키고자 하는 연구가 활발히 진행되고 있는데, 본고에서는 이들 연구의 배경 및 연구 동향들을 소개하기로 한다.

  • PDF

CPC: A File I/O Cache Management Policy for Compute-Bound Workloads

  • Bahn, Hyokyung
    • International journal of advanced smart convergence
    • /
    • v.11 no.2
    • /
    • pp.1-6
    • /
    • 2022
  • With the emergence of the new era of the 4th industrial revolution, compute-bound workloads with large memory footprint like big data processing increase dramatically. Even in such compute-bound workloads, however, we observe bulky I/Os while loading big data from storage to memory. Although file I/O cache plays a role of accelerating the performance of storage I/O, we found out that the cache hit rate in such environments is not improved even though we increase the file I/O cache capacity because of some special I/O references generated by compute-bound workloads. To cope with this situation, we propose a new file I/O cache management policy that improves the cache hit rate for compute-bound workloads significantly. Trace-driven simulations by replaying file I/O reference logs of compute-bound workloads show that the proposed cache management policy improves the cache hit rate compared to the well-acknowledged CLOCK algorithm by a large margin.

Research on Natural Language Processing Package using Open Source Software (오픈소스 소프트웨어를 활용한 자연어 처리 패키지 제작에 관한 연구)

  • Lee, Jong-Hwa;Lee, Hyun-Kyu
    • The Journal of Information Systems
    • /
    • v.25 no.4
    • /
    • pp.121-139
    • /
    • 2016
  • Purpose In this study, we propose the special purposed R package named ""new_Noun()" to process nonstandard texts appeared in various social networks. As the Big data is getting interested, R - analysis tool and open source software is also getting more attention in many fields. Design/methodology/approach With more than 9,000 R packages, R provides a user-friendly functions of a variety of data mining, social network analysis and simulation functions such as statistical analysis, classification, prediction, clustering and association analysis. Especially, "KoNLP" - natural language processing package for Korean language - has reduced the time and effort of many researchers. However, as the social data increases, the informal expressions of Hangeul (Korean character) such as emoticons, informal terms and symbols make the difficulties increase in natural language processing. Findings In this study, to solve the these difficulties, special algorithms that upgrade existing open source natural language processing package have been researched. By utilizing the "KoNLP" package and analyzing the main functions in noun extracting command, we developed a new integrated noun processing package "new_Noun()" function to extract nouns which improves more than 29.1% compared with existing package.

Performance Comparison of Python and Scala APIs in Spark Distributed Cluster Computing System (Spark 기반에서 Python과 Scala API의 성능 비교 분석)

  • Ji, Keung-yeup;Kwon, Youngmi
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.2
    • /
    • pp.241-246
    • /
    • 2020
  • Hadoop is a framework to process large data sets in a distributed way across clusters of nodes. It has been a popular platform to process big data, but in recent years, other platforms became competitive ones depending on the characteristics of the application. Spark is one of distributed platforms to enable real-time data processing and improve overall processing performance over Hadoop by introducing in-memory processing instead of disk I/O. Whereas Hadoop is designed to work on Java and data analysis is processed using Java API, Spark provides a variety of APIs with Scala, Python, Java and R. In this paper, the goal is to find out whether the APIs of different programming languages af ect the performances in Spark. We chose two popular APIs: Python and Scala. Python is easy to learn and is used in AI domain in a wide range. Scala is a programming language with advantages of parallelism. Our experiment shows much faster processing with Scala API than Python API. For the performance issues on AI-based analysis, further study is needed.

A Safety IO Throttling Method Inducting Differential End of Life to Improving the Reliability of Big Data Maintenance in the SSD based RAID (SSD기반 RAID 시스템에서 빅데이터 유지 보수의 신뢰성을 향상시키기 위한 차등 수명 마감을 유도하는 안전한 IO 조절 기법)

  • Lee, Hyun-Seob
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.593-598
    • /
    • 2022
  • Recently, data production has seen explosive growth, and the storage systems to store these big data safely and quickly is evolving in various ways. A typical configuration of storage systems is the use of SSDs with fast data processing speed as a RAID group that can maintain reliable data. However, since NAND flash memory, which composes SSD, has the feature that deterioration if writes more than a certain number of times are repeated, can increase the likelihood of simultaneous failure on multiple SSDs in a RAID group. And this can result in serious reliability problems that data cannot be recovered. Thus, in order to solve this problem, we propose a method of throttling IOs so that each SSD within a RAID group leads to a different life-end. The technique proposed in this paper utilizes SMART to control the state of each SSD and the number of IOs allocated according to the data pattern used step by step. In addition, this method has the advantage of preventing large amounts of concurrency defects in RAID because it induces differential lifetime finishes of SSDs.

Data Central Network Technology Trend Analysis using SDN/NFV/Edge-Computing (SDN, NFV, Edge-Computing을 이용한 데이터 중심 네트워크 기술 동향 분석)

  • Kim, Ki-Hyeon;Choi, Mi-Jung
    • KNOM Review
    • /
    • v.22 no.3
    • /
    • pp.1-12
    • /
    • 2019
  • Recently, researching using big data and AI has emerged as a major issue in the ICT field. But, the size of big data for research is growing exponentially. In addition, users of data transmission of existing network method suggest that the problem the time taken to send and receive big data is slower than the time to copy and send the hard disk. Accordingly, researchers require dynamic and flexible network technology that can transmit data at high speed and accommodate various network structures. SDN/NFV technologies can be programming a network to provide a network suitable for the needs of users. It can easily solve the network's flexibility and security problems. Also, the problem with performing AI is that centralized data processing cannot guarantee real-time, and network delay occur when traffic increases. In order to solve this problem, the edge-computing technology, should be used which has moved away from the centralized method. In this paper, we investigate the concept and research trend of SDN, NFV, and edge-computing technologies, and analyze the trends of data central network technologies used by combining these three technologies.

Presto Architecture Proposal Using Memory Caching in Big Data Environment (빅데이터 환경에서 메모리 캐싱을 활용한 Presto 아키텍처 제안)

  • Hwang, Sun-Hee;Kim, Tae-Won;Shin, Min-Kyu
    • Annual Conference of KIPS
    • /
    • 2019.10a
    • /
    • pp.89-92
    • /
    • 2019
  • 빅데이터 환경에서 대화형 분석 질의문을 수행하려는 요구사항이 늘어나면서 데이터 처리속도가 중요한 성능 지표가 되었다. 이에 Presto 는 많은 빅데이터 처리 엔진 중 메모리 기반으로 빠른 질의 처리가 가능하여 널리 사용되어 왔다. 하지만 메모리 처리 엔진인 Presto 도 디스크 기반의 저장소를 사용한 일부 경우에 성능 저하 현상이 보고되었다. 그래서 본 논문은 빅데이터 처리 성능 향상을 위해 Presto Memory Connector 를 사용하여 메모리 캐싱을 활용한 아키텍처를 제안한다. 그 과정에서 캐싱과 비 캐싱 환경에서 성능검증을 위한 데이터 처리 성능 실험을 수행하였고, 그 결과 향상된 성능을 제공할 수 있음을 확인하였다. 이를 통해 빅데이터 분산환경에서 캐싱을 활용하여 Presto 아키텍처를 설계하는데 근거를 제공하고자 한다.

A Study of Big data-based Machine Learning Techniques for Wheel and Bearing Fault Diagnosis (차륜 및 차축베어링 고장진단을 위한 빅데이터 기반 머신러닝 기법 연구)

  • Jung, Hoon;Park, Moonsung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.1
    • /
    • pp.75-84
    • /
    • 2018
  • Increasing the operation rate of components and stabilizing the operation through timely management of the core parts are crucial for improving the efficiency of the railroad maintenance industry. The demand for diagnosis technology to assess the condition of rolling stock components, which employs history management and automated big data analysis, has increased to satisfy both aspects of increasing reliability and reducing the maintenance cost of the core components to cope with the trend of rapid maintenance. This study developed a big data platform-based system to manage the rolling stock component condition to acquire, process, and analyze the big data generated at onboard and wayside devices of railroad cars in real time. The system can monitor the conditions of the railroad car component and system resources in real time. The study also proposed a machine learning technique that enabled the distributed and parallel processing of the acquired big data and automatic component fault diagnosis. The test, which used the virtual instance generation system of the Amazon Web Service, proved that the algorithm applying the distributed and parallel technology decreased the runtime and confirmed the fault diagnosis model utilizing the random forest machine learning for predicting the condition of the bearing and wheel parts with 83% accuracy.