• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.029 seconds

Research of Semantic Considered Tree Mining Method for an Intelligent Knowledge-Services Platform

  • Paik, Juryon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.5
    • /
    • pp.27-36
    • /
    • 2020
  • In this paper, we propose a method to derive valuable but hidden infromation from the data which is the core foundation in the 4th Industrial Revolution to pursue knowledge-based service fusion. The hyper-connected societies characterized by IoT inevitably produce big data, and with the data in order to derive optimal services for trouble situations it is first processed by discovering valuable information. A data-centric IoT platform is a platform to collect, store, manage, and integrate the data from variable devices, which is actually a type of middleware platforms. Its purpose is to provide suitable solutions for challenged problems after processing and analyzing the data, that depends on efficient and accurate algorithms performing the work of data analysis. To this end, we propose specially designed structures to store IoT data without losing the semantics and provide algorithms to discover the useful information with several definitions and proofs to show the soundness.

A Design on Informal Big Data Topic Extraction System Based on Spark Framework (Spark 프레임워크 기반 비정형 빅데이터 토픽 추출 시스템 설계)

  • Park, Kiejin
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.521-526
    • /
    • 2016
  • As on-line informal text data have massive in its volume and have unstructured characteristics in nature, there are limitations in applying traditional relational data model technologies for data storage and data analysis jobs. Moreover, using dynamically generating massive social data, social user's real-time reaction analysis tasks is hard to accomplish. In the paper, to capture easily the semantics of massive and informal on-line documents with unsupervised learning mechanism, we design and implement automatic topic extraction systems according to the mass of the words that consists a document. The input data set to the proposed system are generated first, using N-gram algorithm to build multiple words to capture the meaning of the sentences precisely, and Hadoop and Spark (In-memory distributed computing framework) are adopted to run topic model. In the experiment phases, TB level input data are processed for data preprocessing and proposed topic extraction steps are applied. We conclude that the proposed system shows good performance in extracting meaningful topics in time as the intermediate results come from main memories directly instead of an HDD reading.

Improvement of early prediction performance of under-performing students using anomaly data (이상 데이터를 활용한 성과부진학생의 조기예측성능 향상)

  • Hwang, Chul-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1608-1614
    • /
    • 2022
  • As competition between universities intensifies due to the recent decrease in the number of students, it is recognized as an essential task of universities to predict students who are underperforming at an early stage and to make various efforts to prevent dropouts. For this, a high-performance model that accurately predicts student performance is essential. This paper proposes a method to improve prediction performance by removing or amplifying abnormal data in a classification prediction model for identifying underperforming students. Existing anomaly data processing methods have mainly focused on deleting or ignoring data, but this paper presents a criterion to distinguish noise from change indicators, and contributes to improving the performance of predictive models by deleting or amplifying data. In an experiment using open learning performance data for verification of the proposed method, we found a number of cases in which the proposed method can improve classification performance compared to the existing method.

A Study on the Definition of Data Literacy for Elementary and Secondary Artificial Intelligence Education (초·중등 인공지능 교육을 위한 데이터 리터러시 정의 연구)

  • Kim, SeulKi;Kim, Taeyoung
    • 한국정보교육학회:학술대회논문집
    • /
    • 2021.08a
    • /
    • pp.59-67
    • /
    • 2021
  • The development of AI technology has brought about a big change in our lives. As AI's influence grows from life to society to the economy, the importance of education on AI and data is also growing. In particular, the OECD Education Research Report and various domestic information and curriculum studies address data literacy and present it as an essential competency. Looking at domestic and international studies, one can see that the definition of data literacy differs in its specific content and scope from researchers to researchers. Thus, the definition of major research related to data literacy was analyzed from various angles and derived from various angles. In key studies, Word2vec natural language processing methods, along with word frequency analysis used to define data literacy, are used to analyze semantic similarities and nominate them based on content elements of curriculum research to derive the definition of 'understanding and using data to process information'. Based on the definition of data literacy derived from this study, we hope that the contents will be revised and supplemented, and more research will be conducted to provide a good foundation for educational research that develops students' future capabilities.

  • PDF

IoT Based Real-Time Indoor Air Quality Monitoring Platform for a Ventilation System (청정환기장치 최적제어를 위한 IoT 기반 실시간 공기질 모니터링 플랫폼 구현)

  • Uprety, Sudan Prasad;Kim, Yoosin
    • Journal of Internet Computing and Services
    • /
    • v.21 no.6
    • /
    • pp.95-104
    • /
    • 2020
  • In this paper, we propose the real time indoor air quality monitoring and controlling platform on cloud using IoT sensor data such as PM10, PM2.5, CO2, VOCs, temperature, and humidity which has direct or indirect impact to indoor air quality. The system is connected to air ventilator to manage and optimize the indoor air quality. The proposed system has three main parts; First, IoT data collection service to measure, and collect indoor air quality in real time from IoT sensor network, Second, Big data processing pipeline to process and store the collected data on cloud platform and Finally, Big data analysis and visualization service to give real time insight of indoor air quality on mobile and web application. For the implication of the proposed system, IoT sensor kits are installed on three different public day care center where the indoor pollution can cause serious impact to the health and education of growing kids. Analyzed results are visualized on mobile and web application. The impact of ventilation system to indoor air quality is tested statistically and the result shows the proper optimization of indoor air quality.

A Travel Time Budget Estimation Using a Mobile Phone Signaling Data (통신 빅데이터를 활용한 통행시간예산 산출 연구)

  • Chung, Younshik;Nam, Sanggi;Song, Tai-Jin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.3
    • /
    • pp.457-465
    • /
    • 2018
  • This study proposes a novel approach to explore a "travel time budget (TTB)" using a mobile phone signaling data (MPSD), which are passively generated between a mobile phone and a base station. The data analyzied in this study were provided from KT for 8 days (from May 19 to 26 in 2016). They were about 45 million signals passively generated from users whose stay area during night was classified as three areas in Mapo-gu, Seoul and in the city of Sejong. The estmation of TTB was implemented with various pre-processing techniques on the MPSD data in a data-driven analysis. As a result, the TTBs of Mapo-gu, Seoul and Sejong were 82.94 and 80.70 minutes, respectively. The results in this study were also compared with those based on the traditional methods. The authors expect that this result will help transport experts improve the use of MPSD.

RDFS Rule based Parallel Reasoning Scheme for Large-Scale Streaming Sensor Data (대용량 스트리밍 센서데이터 환경에서 RDFS 규칙기반 병렬추론 기법)

  • Kwon, SoonHyun;Park, Youngtack
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.686-698
    • /
    • 2014
  • Recently, large-scale streaming sensor data have emerged due to explosive supply of smart phones, diffusion of IoT and Cloud computing technology, and generalization of IoT devices. Also, researches on combination of semantic web technology are being actively pushed forward by increasing of requirements for creating new value of data through data sharing and mash-up in large-scale environments. However, we are faced with big issues due to large-scale and streaming data in the inference field for creating a new knowledge. For this reason, we propose the RDFS rule based parallel reasoning scheme to service by processing large-scale streaming sensor data with the semantic web technology. In the proposed scheme, we run in parallel each job of Rete network algorithm, the existing rule inference algorithm and sharing data using the HBase, a hadoop database, as a public storage. To achieve this, we implement our system and evaluate performance through the AWS data of the weather center as large-scale streaming sensor data.

A Study on the Safe Use of Data in the Digital Healthcare Industry Based on the Data 3 Act (데이터 3법 기반 디지털 헬스케어 산업에서 안전한 데이터 활용에 관한 연구)

  • Choi, Sun-Mi;Kim, Kyoung-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.25-37
    • /
    • 2022
  • The government and private companies are endeavoring to help the digital healthcare industry grow. This includes easing regulations on the big data industry such as the amendment of the Data 3 Act. Despite these efforts, however, there have been constant demands for the amendment of laws related to the medical field and for securing medical data transmissions. In this paper, the Data 3 Act of Korea and the legal system related to healthcare are examined. Then the legal, institutional, and technical aspects of the strategies are compared to understand the issues and implications. Based on this, a legal and institutional strategy suitable for the digital healthcare industry in Korea is suggested. Additionally, a direction to improve social perception along with technical measures such as safe de-identification processing and data transmission are also proposed. This study hopes to contribute to the spread of various convergent industries along with the digital healthcare industry.

The Strength Analysis of Passenger Car Seat Frame (승용차 시트프레임의 강도해석)

  • 임종명;장인식
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.11 no.6
    • /
    • pp.205-212
    • /
    • 2003
  • This paper may provide a basic design data for the safer car seat mechanism and the quality of the material used by finding out the passenger's dynamic behavior when protected by seat belt during collision. A computer simulation with finite element method is used to accomplish this objective. At first, a detailed geometric model of the seat is constructed using CAD program. The formation of a finite element from a geometric data of the seat is carried out using Hyper-Mesh that is the commercial software for mesh generation and post processing. In addition to seat modeling, the finite element model of seat belt and dummy is formed using the same software. Rear impact analysis is accomplished using Pam-Crash with crash pulse. The part of the recliner and right frame is under big stress in rear crash analysis because the acceleration force is exerted on the back of the seat by dummy. The stress condition of the part of the bracket is checked as well because it is considered as an important variable on the seat design. Front impact model which including dummy and seal belt is analyzed. A Part of anchor buckle of seat frame has high stress distribution because of retraction force due to forward motion of dummy at the moment of collision. On the basis of the analysis result, remodeling and reanalysis works had been repeatedly done until a satisfactory result is obtained.

Research on High-speed Event Detection based on Fuzzy Rule-based Quine-Maccluskey for Streaming Big Data (퍼지 기반 퀸-맥클러스키 규칙 감축 기법을 이용한 대용량 스트리밍 데이터의 고속 이벤트 탐지 기법 연구)

  • Park, Na-Young;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.01a
    • /
    • pp.373-376
    • /
    • 2014
  • 최근 모바일 기기 및 무선기기의 발달로 인하여 센서 네트워크가 다양한 분야에서 응용되고 있다. 따라서 센서에서 실시간으로 발생하는 스트리밍 데이터에서 이벤트를 감지하고 분석하는 것은 중요한 연구 분야로 부각되고 있다. 단순 이벤트의 발생 조건을 빠르게 판별하기 위해 비트맵 인덱스 기반 복합 이벤트 검출 기법 등 여러 가지 방법들이 사용되고 있지만, 아직까지 이기종 센서에서 발생하는 각기 다른 형태의 데이터를 융합하여 이벤트를 검출하는 복합 이벤트 처리에 대한 연구는 미비한 실정이다. 본 논문에서는 각기 다른 형태를 가지는 스트리밍 데이터에 멤버쉽 함수를 적용하여 퍼지화 함으로서 이기종 센서에서 발생하는 데이터를 융합 처리가능하며, Quine-Mccluskey 감축기법을 통하여 규칙의 신뢰도 및 속도가 향상된 의사결정을 하는 고속 이벤트 탐지기법을 제안한다.

  • PDF