• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.034 seconds

Information Technology Infrastructure for Agriculture Genotyping Studies

  • Pardamean, Bens;Baurley, James W.;Perbangsa, Anzaludin S.;Utami, Dwinita;Rijzaani, Habib;Satyawan, Dani
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.655-665
    • /
    • 2018
  • In efforts to increase its agricultural productivity, the Indonesian Center for Agricultural Biotechnology and Genetic Resources Research and Development has conducted a variety of genomic studies using high-throughput DNA genotyping and sequencing. The large quantity of data (big data) produced by these biotechnologies require high performance data management system to store, backup, and secure data. Additionally, these genetic studies are computationally demanding, requiring high performance processors and memory for data processing and analysis. Reliable network connectivity with large bandwidth to transfer data is essential as well as database applications and statistical tools that include cleaning, quality control, querying based on specific criteria, and exporting to various formats that are important for generating high yield varieties of crops and improving future agricultural strategies. This manuscript presents a reliable, secure, and scalable information technology infrastructure tailored to Indonesian agriculture genotyping studies.

Design of Distributed Hadoop Full Stack Platform for Big Data Collection and Processing (빅데이터 수집 처리를 위한 분산 하둡 풀스택 플랫폼의 설계)

  • Lee, Myeong-Ho
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.7
    • /
    • pp.45-51
    • /
    • 2021
  • In accordance with the rapid non-face-to-face environment and mobile first strategy, the explosive increase and creation of many structured/unstructured data every year demands new decision making and services using big data in all fields. However, there have been few reference cases of using the Hadoop Ecosystem, which uses the rapidly increasing big data every year to collect and load big data into a standard platform that can be applied in a practical environment, and then store and process well-established big data in a relational database. Therefore, in this study, after collecting unstructured data searched by keywords from social network services based on Hadoop 2.0 through three virtual machine servers in the Spring Framework environment, the collected unstructured data is loaded into Hadoop Distributed File System and HBase based on the loaded unstructured data, it was designed and implemented to store standardized big data in a relational database using a morpheme analyzer. In the future, research on clustering and classification and analysis using machine learning using Hive or Mahout for deep data analysis should be continued.

A Study on the Application of Natural Language Processing in Health Care Big Data: Focusing on Word Embedding Methods (보건의료 빅데이터에서의 자연어처리기법 적용방안 연구: 단어임베딩 방법을 중심으로)

  • Kim, Hansang;Chung, Yeojin
    • Health Policy and Management
    • /
    • v.30 no.1
    • /
    • pp.15-25
    • /
    • 2020
  • While healthcare data sets include extensive information about patients, many researchers have limitations in analyzing them due to their intrinsic characteristics such as heterogeneity, longitudinal irregularity, and noise. In particular, since the majority of medical history information is recorded in text codes, the use of such information has been limited due to the high dimensionality of explanatory variables. To address this problem, recent studies applied word embedding techniques, originally developed for natural language processing, and derived positive results in terms of dimensional reduction and accuracy of the prediction model. This paper reviews the deep learning-based natural language processing techniques (word embedding) and summarizes research cases that have used those techniques in the health care field. Then we finally propose a research framework for applying deep learning-based natural language process in the analysis of domestic health insurance data.

Visualizing Unstructured Data using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 비정형 데이터 시각화)

  • Nam, Soo-Tai;Chen, Jinhui;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.151-154
    • /
    • 2021
  • Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study was analyzed for 21 papers in the March 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 305 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

An Automatic Issues Analysis System using Big-data (빅데이터를 이용한 자동 이슈 분석 시스템)

  • Choi, Dongyeol;Ahn, Eungyoung
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.2
    • /
    • pp.240-247
    • /
    • 2020
  • There have been many efforts to understand the trends of IT environments that have been rapidly changed. In a view point of management, it needs to prepare the social systems in advance by using Big-data these days. This research is for the implementation of Issue Analysis System for the Big-data based on Artificial Intelligence. This paper aims to confirm the possibility of new technology for Big-data processing through the proposed Issue Analysis System using. We propose a technique for semantic reasoning and pattern analysis based on the AI and show the proposed method is feasible to handle the Big-data. We want to verify that the proposed method can be useful in dealing with Big-data by applying latest security issues into the system. The experiments show the potentials for the proposed method to use it as a base technology for dealing with Big-data for various purposes.

A Study on Recognition of Artificial Intelligence Utilizing Big Data Analysis (빅데이터 분석을 활용한 인공지능 인식에 관한 연구)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.129-130
    • /
    • 2018
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Artificial Intelligence" keyword, one month as of May 19, 2018. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Artificial Intelligence" has been found to be technology (4,122). This study suggests theoretical implications based on the results.

  • PDF

Research on the Development of Big Data Analysis Tools for Engineering Education (공학교육 빅 데이터 분석 도구 개발 연구)

  • Kim, Younyoung;Kim, Jaehee
    • Journal of Engineering Education Research
    • /
    • v.26 no.4
    • /
    • pp.22-35
    • /
    • 2023
  • As information and communication technology has developed remarkably, it has become possible to analyze various types of large-volume data generated at a speed close to real time, and based on this, reliable value creation has become possible. Such big data analysis is becoming an important means of supporting decision-making based on scientific figures. The purpose of this study is to develop a big data analysis tool that can analyze large amounts of data generated through engineering education. The tasks of this study are as follows. First, a database is designed to store the information of entries in the National Creative Capstone Design Contest. Second, the pre-processing process is checked for analysis with big data analysis tools. Finally, analyze the data using the developed big data analysis tool. In this study, 1,784 works submitted to the National Creative Comprehensive Design Contest from 2014 to 2019 were analyzed. As a result of selecting the top 10 words through topic analysis, 'robot' ranked first from 2014 to 2019, and energy, drones, ultrasound, solar energy, and IoT appeared with high frequency. This result seems to reflect the current core topics and technology trends of the 4th Industrial Revolution. In addition, it seems that due to the nature of the Capstone Design Contest, students majoring in electrical/electronic, computer/information and communication engineering, mechanical engineering, and chemical/new materials engineering who can submit complete products for problem solving were selected. The significance of this study is that the results of this study can be used in the field of engineering education as basic data for the development of educational contents and teaching methods that reflect industry and technology trends. Furthermore, it is expected that the results of big data analysis related to engineering education can be used as a means of preparing preemptive countermeasures in establishing education policies that reflect social changes.

Dynamic Load Management Method for Spatial Data Stream Processing on MapReduce Online Frameworks (맵리듀스 온라인 프레임워크에서 공간 데이터 스트림 처리를 위한 동적 부하 관리 기법)

  • Jeong, Weonil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.8
    • /
    • pp.535-544
    • /
    • 2018
  • As the spread of mobile devices equipped with various sensors and high-quality wireless network communications functionsexpands, the amount of spatio-temporal data generated from mobile devices in various service fields is rapidly increasing. In conventional research into processing a large amount of real-time spatio-temporal streams, it is very difficult to apply a Hadoop-based spatial big data system, designed to be a batch processing platform, to a real-time service for spatio-temporal data streams. This paper extends the MapReduce online framework to support real-time query processing for continuous-input, spatio-temporal data streams, and proposes a load management method to distribute overloads for efficient query processing. The proposed scheme shows a dynamic load balancing method for the nodes based on the inflow rate and the load factor of the input data based on the space partition. Experiments show that it is possible to support efficient query processing by distributing the spatial data stream in the corresponding area to the shared resources when load management in a specific area is required.

Databases and tools for constructing signal transduction networks in cancer

  • Nam, Seungyoon
    • BMB Reports
    • /
    • v.50 no.1
    • /
    • pp.12-19
    • /
    • 2017
  • Traditionally, biologists have devoted their careers to studying individual biological entities of their own interest, partly due to lack of available data regarding that entity. Large, high-throughput data, too complex for conventional processing methods (i.e., "big data"), has accumulated in cancer biology, which is freely available in public data repositories. Such challenges urge biologists to inspect their biological entities of interest using novel approaches, firstly including repository data retrieval. Essentially, these revolutionary changes demand new interpretations of huge datasets at a systems-level, by so called "systems biology". One of the representative applications of systems biology is to generate a biological network from high-throughput big data, providing a global map of molecular events associated with specific phenotype changes. In this review, we introduce the repositories of cancer big data and cutting-edge systems biology tools for network generation, and improved identification of therapeutic targets.

An Efficient Log Data Processing Architecture for Internet Cloud Environments

  • Kim, Julie;Bahn, Hyokyung
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.8 no.1
    • /
    • pp.33-41
    • /
    • 2016
  • Big data management is becoming an increasingly important issue in both industry and academia of information science community today. One of the important categories of big data generated from software systems is log data. Log data is generally used for better services in various service providers and can also be used to improve system reliability. In this paper, we propose a novel big data management architecture specialized for log data. The proposed architecture provides a scalable log management system that consists of client and server side modules for efficient handling of log data. To support large and simultaneous log data from multiple clients, we adopt the Hadoop infrastructure in the server-side file system for storing and managing log data efficiently. We implement the proposed architecture to support various client environments and validate the efficiency through measurement studies. The results show that the proposed architecture performs better than the existing logging architecture by 42.8% on average. All components of the proposed architecture are implemented based on open source software and the developed prototypes are now publicly available.