• Title/Summary/Keyword: 대용량 분류

Search Result 243, Processing Time 0.035 seconds

Design and Implementation of Query Classification Component in Multi-Level DBMS for Location Based Service (위치기반 서비스를 위한 다중레벨 DBMS에 질의 분류 컴포넌트의 설계 및 구현)

  • Jang Seok-Kyu;Eo Sang Hun;Kim Myung-Heun;Bae Hae-Young
    • The KIPS Transactions:PartD
    • /
    • v.12D no.5 s.101
    • /
    • pp.689-698
    • /
    • 2005
  • Various systems are used to provide the location based services. But, the existing systems have some problems which have difficulties in dealing with faster services for above million people. In order to solve it, a multi-level DBMS which supports both fast data processing and large data management support should be used. The multi-level DBMS with snapshots has all the data existing in disk database and the data which are required to be processed for fast processing are managed in main memory database as snapshots. To optimize performance of this system for location based services, the query classification component which classifies the queries for efficient snapshot usage is needed. In this paper, the query classification component in multi-level DBMS for location based services is designed and implemented. The proposed component classifies queries into three types: (1) memory query, (2) disk query, (3) hybrid query, and increases the rate of snapshot usage. In addition, it applies division mechanisms which divide aspatial and spatial filter condition for partial snapshot usage. Hence, the proposed component enhances system performance by maximizing the usage of snapshot as a result of the efficient query classification.

Processing Speed Improvement of HTTP Traffic Classification Based on Hierarchical Structure of Signature (시그니쳐 계층 구조에 기반한 HTTP 트래픽 분석 시스템의 처리 속도 향상)

  • Choi, Ji-Hyeok;Park, Jun-Sang;Kim, Myung-Sup
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.39B no.4
    • /
    • pp.191-199
    • /
    • 2014
  • Currently, HTTP traffic has been developed rapidly due to appearance of various applications and services based web. Accordingly, HTTP Traffic classification is necessary to effective network management. Among the various signature-based method, Payload signature-based classification method is effective to analyze various aspects of HTTP traffic. However, the payload signature-based method has a significant drawback in high-speed network environment due to the slow processing speed than other classification methods such as header, statistic signature-based. Therefore, we proposed various classification method of HTTP Traffic based HTTP signatures of hierarchical structure and to improve pattern matching speed reflect the hierarchical structure features. The proposed method achieved more performance than aho-corasick to applying real campus network traffic.

Performance Improvement of Radial Basis Function Neural Networks Using Adaptive Feature Extraction (적응적 특징추출을 이용한 Radial Basis Function 신경망의 성능개선)

  • 조용현
    • Journal of Korea Multimedia Society
    • /
    • v.3 no.3
    • /
    • pp.253-262
    • /
    • 2000
  • This paper proposes a new RBF neural network that determines the number and the center of hidden neurons based on the adaptive feature extraction for the input data. The principal component analysis is applied for extracting adaptively the features by reducing the dimension of the given input data. It can simultaneously achieve a superior property of both the principal component analysis by mapping input data into set of statistically independent features and the RBF neural networks. The proposed neural networks has been applied to classify the 200 breast cancer databases by 2-class. The simulation results shows that the proposed neural networks has better performances of the learning time and the classification for test data, in comparison with those using the k-means clustering algorithm. And it is affected less than the k-means clustering algorithm by the initial weight setting and the scope of the smoothing factor.

  • PDF

A Comparison and Analysis on High-Dimensional Clustering Techniques for Data Mining (데이터 마이닝을 위한 고차원 클러스터링 기법에 관한 비교 분석 연구)

  • 김홍일;이혜명
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.12
    • /
    • pp.887-900
    • /
    • 2003
  • Many applications require the clustering of large amounts of high dimensional data. Most automated clustering techniques have been developed but they do not work effectively and/or efficiently on high dimensional (numerical) data, which is due to the so-called “curse of dimensionality”. Moreover, the high dimensional data often contain a significant amount of noise, which causes additional ineffectiveness of algorithms. Therefore, it is necessary to look over the structure and various characteristics of high dimensional data and to develop algorithm that support clustering adapted to applications of the high dimensional database. In this paper, we investigate and classify the existing high dimensional clustering methods by analyzing the strength and weakness of each method for specific applications and comparing them. Especially, in terms of efficiency and effectiveness, we compare the traditional algorithms with CLIP which are developed by us. This study will contribute to develop more advanced algorithms than the current algorithms.

  • PDF

Design and Implementation of a Directory System for Disease Retrieval Services (질병 검색 서비스를 위한 디렉토리 시스템 설계 및 구현)

  • Yeo, Myung-ho;Lee, Yoon-kyeong;Rho, Kyu-jong;Park, Hyoung-soon;Kim, Hak-sin;Park, Jun-ho;Kang, Tae-ho;Kim, Hak-yong;Yoo, Jae-soo
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.709-714
    • /
    • 2009
  • Recently, biological researches are required to deal with a large scale of data. While scientists used classical experimental approaches for researches in the past, it is possible to get more sophisticated observations easily with convergence of information technologies and biology. The study on diseases is one of the most important issues of the life science. Conventional services and databases provide users with information such as classification of diseases, symptoms, and medical treatments through web. However, it is hard to connect or develop them for other new services because they have independent and different criterions. It may be a factor that interferes the development of biology. In this paper, we propose an integrated data structure for the disease database, and design and implement a novel directory system for diseases as an infrastructure for developing other new services.

  • PDF

Extraction of Ground Points from LiDAR Data using Quadtree and Region Growing Method (Quadtree와 영역확장법에 의한 LiDAR 데이터의 지면점 추출)

  • Bae, Dae-Seop;Kim, Jin-Nam;Cho, Gi-Sung
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.19 no.3
    • /
    • pp.41-47
    • /
    • 2011
  • Processing of the raw LiDAR data requires the high-end processor, because data form is a vector. In contrast, if LiDAR data is converted into a regular grid pattern by filltering, that has advantage of being in a low-cost equipment, because of the simple structure and faster processing speed. Especially, by using grid data classification, such as Quadtree, some of trees and cars are removed, so it has advantage of modeling. Therefore, this study presents the algorithm for automatic extraction of ground points using Quadtree and refion growing method from LiDAR data. In addition, Error analysis was performed based on the 1:5000 digital map of sample area to analyze the classification of ground points. In a result, the ground classification accuracy is over 98%. So it has the advantage of extracting the ground points. In addition, non-ground points, such as cars and tree, are effectively removed as using Quadtree and region growing method.

Multi-Attribute based on Data Management Scheme in Big Data Environment (빅 데이터 환경에서 다중 속성 기반의 데이터 관리 기법)

  • Jeong, Yoon-Su;Kim, Yong-Tae;Park, Gil-Cheol
    • Journal of Digital Convergence
    • /
    • v.13 no.1
    • /
    • pp.263-268
    • /
    • 2015
  • Put your information in the object-based sensors and mobile networks has been developed that correlate with ubiquitous information technology as the development of IT technology. However, a security solution is to have the data stored in the server, what minimal conditions. In this paper, we propose a data management method is applied to a hash chain of the properties of the multiple techniques to the data used by the big user and the data services to ensure safe handling large amounts of data being provided in the big data services. Improves the safety of the data tied to the hash chain for the classification to classify the attributes of the data attribute information according to the type of data used for the big data services, functions and characteristics of the proposed method. Also, the distributed processing of big data by utilizing the access control information of the hash chain to connect the data attribute information to a geographically dispersed data easily accessible techniques are proposed.

Classifying and Characterizing the Types of Gentrified Commercial Districts Based on Sense of Place Using Big Data: Focusing on 14 Districts in Seoul (빅데이터를 활용한 젠트리피케이션 상권의 장소성 분류와 특성 분석 -서울시 14개 주요상권을 중심으로-)

  • Young-Jae Kim;In Kwon Park
    • Journal of the Korean Regional Science Association
    • /
    • v.39 no.1
    • /
    • pp.3-20
    • /
    • 2023
  • This study aims to categorize the 14 major gentrified commercial areas of Seoul and analyze their characteristics based on their sense of place. To achieve this, we conducted hierarchical cluster analysis using text data collected from Naver Blog. We divided the districts into two dimensions: "experience" and "feature" and analyzed their characteristics using LDA (Latent Dirichlet Allocation) of the text data and statistical data collected from Seoul Open Data Square. As a result, we classified the commercial districts of Seoul into 5 categories: 'theater district,' 'traditional cultural district,' 'female-beauty district,' 'exclusive restaurant and medical district,' and 'trend-leading district.' The findings of this study are expected to provide valuable insights for policy-makers to develop more efficient and suitable commercial policies.

Automatic Word Spacing of the Korean Sentences by Using End-to-End Deep Neural Network (종단 간 심층 신경망을 이용한 한국어 문장 자동 띄어쓰기)

  • Lee, Hyun Young;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.11
    • /
    • pp.441-448
    • /
    • 2019
  • Previous researches on automatic spacing of Korean sentences has been researched to correct spacing errors by using n-gram based statistical techniques or morpheme analyzer to insert blanks in the word boundary. In this paper, we propose an end-to-end automatic word spacing by using deep neural network. Automatic word spacing problem could be defined as a tag classification problem in unit of syllable other than word. For contextual representation between syllables, Bi-LSTM encodes the dependency relationship between syllables into a fixed-length vector of continuous vector space using forward and backward LSTM cell. In order to conduct automatic word spacing of Korean sentences, after a fixed-length contextual vector by Bi-LSTM is classified into auto-spacing tag(B or I), the blank is inserted in the front of B tag. For tag classification method, we compose three types of classification neural networks. One is feedforward neural network, another is neural network language model and the other is linear-chain CRF. To compare our models, we measure the performance of automatic word spacing depending on the three of classification networks. linear-chain CRF of them used as classification neural network shows better performance than other models. We used KCC150 corpus as a training and testing data.

A Study on an Automatic Classification Model for Facet-Based Multidimensional Analysis of Civil Complaints (패싯 기반 민원 다차원 분석을 위한 자동 분류 모델)

  • Na Rang Kim
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.29 no.1
    • /
    • pp.135-144
    • /
    • 2024
  • In this study, we propose an automatic classification model for quantitative multidimensional analysis based on facet theory to understand public opinions and demands on major issues through big data analysis. Civil complaints, as a form of public feedback, are generated by various individuals on multiple topics repeatedly and continuously in real-time, which can be challenging for officials to read and analyze efficiently. Specifically, our research introduces a new classification framework that utilizes facet theory and political analysis models to analyze the characteristics of citizen complaints and apply them to the policy-making process. Furthermore, to reduce administrative tasks related to complaint analysis and processing and to facilitate citizen policy participation, we employ deep learning to automatically extract and classify attributes based on the facet analysis framework. The results of this study are expected to provide important insights into understanding and analyzing the characteristics of big data related to citizen complaints, which can pave the way for future research in various fields beyond the public sector, such as education, industry, and healthcare, for quantifying unstructured data and utilizing multidimensional analysis. In practical terms, improving the processing system for large-scale electronic complaints and automation through deep learning can enhance the efficiency and responsiveness of complaint handling, and this approach can also be applied to text data processing in other fields.