• Title/Summary/Keyword: 대용량 데이터셋

Search Result 55, Processing Time 0.022 seconds

Distributed Processing Method of Hotspot Spatial Analysis Based on Hadoop and Spark (하둡 및 Spark 기반 공간 통계 핫스팟 분석의 분산처리 방안 연구)

  • Kim, Changsoo;Lee, Joosub;Hwang, KyuMoon;Sung, Hyojin
    • Journal of KIISE
    • /
    • v.45 no.2
    • /
    • pp.99-105
    • /
    • 2018
  • One of the spatial statistical analysis, hotspot analysis is one of easy method of see spatial patterns. It is based on the concept that "Adjacent ones are more relevant than those that are far away". However, in hotspot analysis is spatial adjacency must be considered, Therefore, distributed processing is not easy. In this paper, we proposed a distributed algorithm design for hotspot spatial analysis. Its performance was compared to standalone system and Hadoop, Spark based processing. As a result, it is compare to standalone system, Performance improvement rate of Hadoop at 625.89% and Spark at 870.14%. Furthermore, performance improvement rate is high at Spark processing than Hadoop at as more large data set.

Design and Implementation of the Notification System based on the Event-Profile Model (이벤트-프로파일 모델을 기반으로 한 통지 시스템의 설계 및 구현)

  • Ban, Chae-Hoon;Kim, Dong-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.8
    • /
    • pp.1750-1755
    • /
    • 2011
  • Recently, it is possible for users to acquire necessary data easily as the various schemes of the searching information are developed. Since these data rise continuously like stream data, it is required to extract the appropriate data for the user's needs from the mass data on the internet. In the traditional scheme, they are acquired by processing the user queries after the occurred data are stored at a database. However, it is inefficient to process the user queries over the large volume of continuous data by using the traditional scheme. In this paper, we propose the Event-Profile Model to define the data occurrence on the internet as the events and the user's requirements as the profiles. We also propose and implement the filtering scheme to process the events and the profiles efficiently. We evaluate the performance of the proposed scheme and our experiments show that the new scheme outperforms the other on various dataset.

Design and Implementation of the Notification System using Event-Profile Filtering (이벤트-프로파일 여과를 이용한 통지시스템의 설계 및 구현)

  • Ban, Chae-Hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2010.10a
    • /
    • pp.129-132
    • /
    • 2010
  • Users can obtain useful information from large of data because of development of internet. Since these data rise continuously like stream data, it is required to extract the appropriate information efficiently for the user's needs. In the traditional scheme, they are acquired by processing the user queries after the occurred data are stored at a database. However, it is inefficient to process the user queries over the large volume of continuous data by using the traditional scheme. In this paper, we propose the Event-Profile Model to define the data occurrence on the internet as the events and the user's requirements as the profiles. We also propose and implement the filtering scheme to process the events and the profiles efficiently. We evaluate the performance of the proposed scheme and our experiments show that the new scheme outperforms the other on various dataset.

  • PDF

Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words) (속성선택방법과 워드임베딩 및 BOW (Bag-of-Words)를 결합한 오피니언 마이닝 성과에 관한 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.17 no.2
    • /
    • pp.163-170
    • /
    • 2019
  • Over the past decade, the development of the Web explosively increased the data. Feature selection step is an important step in extracting valuable data from a large amount of data. This study proposes a novel opinion mining model based on combining feature selection (FS) methods with Word embedding to vector (Word2vec) and BOW (Bag-of-words). FS methods adopted for this study are CFS (Correlation based FS) and IG (Information Gain). To select an optimal FS method, a number of classifiers ranging from LR (logistic regression), NN (neural network), NBN (naive Bayesian network) to RF (random forest), RS (random subspace), ST (stacking). Empirical results with electronics and kitchen datasets showed that LR and ST classifiers combined with IG applied to BOW features yield best performance in opinion mining. Results with laptop and restaurant datasets revealed that the RF classifier using IG applied to Word2vec features represents best performance in opinion mining.

Grid Resource Selection System Using Decision Tree Method (의사결정 트리 기법을 이용한 그리드 자원선택 시스템)

  • Noh, Chang-Hyeon;Cho, Kyu-Cheol;Ma, Yong-Beom;Lee, Jong-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.1
    • /
    • pp.1-10
    • /
    • 2008
  • In order to high-performance data Processing, effective resource selection is needed since grid resources are composed of heterogeneous networks and OS systems in the grid environment. In this paper. we classify grid resources with data properties and user requirements for resource selection using a decision tree method. Our resource selection method can provide suitable resource selection methodology using classification with a decision tree to grid users. This paper evaluates our grid system performance with throughput. utilization, job loss, and average of turn-around time and shows experiment results of our resource selection model in comparison with those of existing resource selection models such as Condor-G and Nimrod-G. These experiment results showed that our resource selection model provides a vision of efficient grid resource selection methodology.

  • PDF

Design of Effective Intrusion Detection System for Wireless Local Area Network (무선랜을 위한 효율적인 침입탐지시스템 설계)

  • Woo, Sung-Hee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.185-191
    • /
    • 2008
  • Most threats of WLAN are easily caused by attackers who access to the radio link between STA and AP, which involves some Problems to intercept network communications or inject additional messages into them. In comparison with wired LAN, severity of wireless LAN against threats is bigger than the other networks. To make up for the vulnerability of wireless LAN, it needs to use the Intrusion Detection System using a powerful intrusion detection method as SVM. However, due to classification based on calculating values after having expressed input data in vector space by SVM, continuous data type can not be used as any input data. In this paper, therefore, we design the IDS system for WLAN by tuning with SVM and data-mining mechanism to defend the vulnerability on certain WLAN and then we demonstrate the superiority of our method.

  • PDF

Mobile Underground High-capacity 3D Spatial Information Tiling Transfer Protocol Development (모바일 지하 대용량 3D 공간정보 타일링 전송 프로토콜 개발)

  • Lee, Tae Hyung;Jo, Won Je;Kim, Hyun Woo
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.6
    • /
    • pp.491-496
    • /
    • 2021
  • In line with the modern era in which the safety of underground facilities and the use of underground information are increasingly emphasized, the state is pushing for more precise and accurate underground spatial information to be secured and utilized. Therefore, we need to pay more attention to subsurface geospatial data. In the future, the Ministry of Land, Infrastructure and Transport will actively utilize the 15 types of Integrated Underground Geospatial Information Map(6 types of underground facilities, 6 types of underground structures, 3 types of ground) that the Ministry of Land, Infrastructure and Transport is building as three-dimensional underground spatial information, and contribute greatly to improving national safety and convenience in underground construction. expected to do However, when a site manager requests an Integrated Underground Geospatial Information Map with a mobile device, if the large-capacity integrated underground space map is not quickly transmitted over the wireless section and is not serviced, it causes inconvenience to the site manager and delays work. In this paper, the goal of this paper is to enable field managers to quickly receive a tiled Integrated Underground Geospatial Information Map with minimal information exchange. Therefore, the tiling system is configured according to the dataset for high-speed Mobile Integrated Underground Geospatial Information Map transmission. In addition, a transmission system for the Mobile Integrated Underground Geospatial Information Map is established, and a TCP/IP (Transmission Control Protocol/Internet Protocol)-based spatial information tiling transmission protocol dedicated to the on-site Integrated Underground Geospatial Information Map is developed.

Korean Machine Reading Comprehension for Patent Consultation Using BERT (BERT를 이용한 한국어 특허상담 기계독해)

  • Min, Jae-Ok;Park, Jin-Woo;Jo, Yu-Jeong;Lee, Bong-Gun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.145-152
    • /
    • 2020
  • MRC (Machine reading comprehension) is the AI NLP task that predict the answer for user's query by understanding of the relevant document and which can be used in automated consult services such as chatbots. Recently, the BERT (Pre-training of Deep Bidirectional Transformers for Language Understanding) model, which shows high performance in various fields of natural language processing, have two phases. First phase is Pre-training the big data of each domain. And second phase is fine-tuning the model for solving each NLP tasks as a prediction. In this paper, we have made the Patent MRC dataset and shown that how to build the patent consultation training data for MRC task. And we propose the method to improve the performance of the MRC task using the Pre-trained Patent-BERT model by the patent consultation corpus and the language processing algorithm suitable for the machine learning of the patent counseling data. As a result of experiment, we show that the performance of the method proposed in this paper is improved to answer the patent counseling query.

A Study on the Semiautomatic Construction of Domain-Specific Relation Extraction Datasets from Biomedical Abstracts - Mainly Focusing on a Genic Interaction Dataset in Alzheimer's Disease Domain - (바이오 분야 학술 문헌에서의 분야별 관계 추출 데이터셋 반자동 구축에 관한 연구 - 알츠하이머병 유관 유전자 간 상호 작용 중심으로 -)

  • Choi, Sung-Pil;Yoo, Suk-Jong;Cho, Hyun-Yang
    • Journal of Korean Library and Information Science Society
    • /
    • v.47 no.4
    • /
    • pp.289-307
    • /
    • 2016
  • This paper introduces a software system and process model for constructing domain-specific relation extraction datasets semi-automatically. The system uses a set of terms such as genes, proteins diseases and so forth as inputs and then by exploiting massive biological interaction database, generates a set of term pairs which are utilized as queries for retrieving sentences containing the pairs from scientific databases. To assess the usefulness of the proposed system, this paper applies it into constructing a genic interaction dataset related to Alzheimer's disease domain, which extracts 3,510 interaction-related sentences by using 140 gene names in the area. In conclusion, the resulting outputs of the case study performed in this paper indicate the fact that the system and process could highly boost the efficiency of the dataset construction in various subfields of biomedical research.

Anomaly Detection Methodology Based on Multimodal Deep Learning (멀티모달 딥 러닝 기반 이상 상황 탐지 방법론)

  • Lee, DongHoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.101-125
    • /
    • 2022
  • Recently, with the development of computing technology and the improvement of the cloud environment, deep learning technology has developed, and attempts to apply deep learning to various fields are increasing. A typical example is anomaly detection, which is a technique for identifying values or patterns that deviate from normal data. Among the representative types of anomaly detection, it is very difficult to detect a contextual anomaly that requires understanding of the overall situation. In general, detection of anomalies in image data is performed using a pre-trained model trained on large data. However, since this pre-trained model was created by focusing on object classification of images, there is a limit to be applied to anomaly detection that needs to understand complex situations created by various objects. Therefore, in this study, we newly propose a two-step pre-trained model for detecting abnormal situation. Our methodology performs additional learning from image captioning to understand not only mere objects but also the complicated situation created by them. Specifically, the proposed methodology transfers knowledge of the pre-trained model that has learned object classification with ImageNet data to the image captioning model, and uses the caption that describes the situation represented by the image. Afterwards, the weight obtained by learning the situational characteristics through images and captions is extracted and fine-tuning is performed to generate an anomaly detection model. To evaluate the performance of the proposed methodology, an anomaly detection experiment was performed on 400 situational images and the experimental results showed that the proposed methodology was superior in terms of anomaly detection accuracy and F1-score compared to the existing traditional pre-trained model.