• Title/Summary/Keyword: Classification for Each

Search Result 3,953, Processing Time 0.033 seconds

Keyword Reorganization Techniques for Improving the Identifiability of Topics (토픽 식별성 향상을 위한 키워드 재구성 기법)

  • Yun, Yeoil;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.18 no.4
    • /
    • pp.135-149
    • /
    • 2019
  • Recently, there are many researches for extracting meaningful information from large amount of text data. Among various applications to extract information from text, topic modeling which express latent topics as a group of keywords is mainly used. Topic modeling presents several topic keywords by term/topic weight and the quality of those keywords are usually evaluated through coherence which implies the similarity of those keywords. However, the topic quality evaluation method based only on the similarity of keywords has its limitations because it is difficult to describe the content of a topic accurately enough with just a set of similar words. In this research, therefore, we propose topic keywords reorganizing method to improve the identifiability of topics. To reorganize topic keywords, each document first needs to be labeled with one representative topic which can be extracted from traditional topic modeling. After that, classification rules for classifying each document into a corresponding label are generated, and new topic keywords are extracted based on the classification rules. To evaluated the performance our method, we performed an experiment on 1,000 news articles. From the experiment, we confirmed that the keywords extracted from our proposed method have better identifiability than traditional topic keywords.

A Classification Algorithm Based on Data Clustering and Data Reduction for Intrusion Detection System over Big Data

  • Wang, Qiuhua;Ouyang, Xiaoqin;Zhan, Jiacheng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.7
    • /
    • pp.3714-3732
    • /
    • 2019
  • With the rapid development of network, Intrusion Detection System(IDS) plays a more and more important role in network applications. Many data mining algorithms are used to build IDS. However, due to the advent of big data era, massive data are generated. When dealing with large-scale data sets, most data mining algorithms suffer from a high computational burden which makes IDS much less efficient. To build an efficient IDS over big data, we propose a classification algorithm based on data clustering and data reduction. In the training stage, the training data are divided into clusters with similar size by Mini Batch K-Means algorithm, meanwhile, the center of each cluster is used as its index. Then, we select representative instances for each cluster to perform the task of data reduction and use the clusters that consist of representative instances to build a K-Nearest Neighbor(KNN) detection model. In the detection stage, we sort clusters according to the distances between the test sample and cluster indexes, and obtain k nearest clusters where we find k nearest neighbors. Experimental results show that searching neighbors by cluster indexes reduces the computational complexity significantly, and classification with reduced data of representative instances not only improves the efficiency, but also maintains high accuracy.

Establishment of Risk Database and Development of Risk Classification System for NATM Tunnel (NATM 터널 공정리스크 데이터베이스 구축 및 리스크 분류체계 개발)

  • Kim, Hyunbee;Karunarathne, Batagalle Vinuri;Kim, ByungSoo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.25 no.1
    • /
    • pp.32-41
    • /
    • 2024
  • In the construction industry, not only safety accidents, but also various complex risks such as construction delays, cost increases, and environmental pollution occur, and management technologies are needed to solve them. Among them, process risk management, which directly affects the project, lacks related information compared to its importance. This study tried to develop a MATM tunnel process risk classification system to solve the difficulty of risk information retrieval due to the use of different classification systems for each project. Risk collection used existing literature review and experience mining techniques, and DB construction utilized the concept of natural language processing. For the structure of the classification system, the existing WBS structure was adopted in consideration of compatibility of data, and an RBS linked to the work species of the WBS was established. As a result of the research, a risk classification system was completed that easily identifies risks by work type and intuitively reveals risk characteristics and risk factors linked to risks. As a result of verifying the usability of the established classification system, it was found that the classification system was effective as risks and risk factors for each work type were easily identified by user input of keywords. Through this study, it is expected to contribute to preventing an increase in cost and construction period by identifying risks according to work types in advance when planning and designing NATM tunnels and establishing countermeasures suitable for those factors.

The Validation Study of the Qestionaire of Sasang constitution Classification: Comparative Analysis with Sixteen Personality Factor Questionaire(16PF) (사상체질분류검사(四象體質分類檢査)의 준거타당화(准据妥當化) 연구(硏究) (성격요인검사(性格要因檢査)-16PF-와의 비교(比較) 분석(分析)))

  • Lee, Jung-Chan;Ko, Byung-He;Song, Il-Byung
    • Journal of Sasang Constitutional Medicine
    • /
    • v.5 no.1
    • /
    • pp.87-104
    • /
    • 1993
  • This study was performed to analize the possibility of validation about Questionaire of Sasang constitution Classification which has been made through several stages for the purpose of promoting the objectivity of Sasang constitutional classification. The results of statistical research about the responses data of Sasang Questionaire are as follow: 1. In the case of mutual relation between the items of the QSCC(Questionaire of Sasang Constitution Classification) and 16PF, every item displaced singnificant result.(in male group) 2. In the investigation to classify the score distribution of Sasang items three groups and analize the gap of mean value in each of 16PF, the characters of each Sasang group were turn out as follows: (1) The Taee-Yang group has extrovert and narcissistic inclination. (2) The So-Yang group shows remarkable extrovert inclination. (3) The Tae-Eum group has hidden fear and introvert inclination. (4) The So-Eum group has revealed physical unstability and introvert inclination. 3. The statistical research of female group didn't display any significant result. It will be necessary to analize the respondent shape of female group and introduce new items to coincident with faminine psychology. 4. In the research using 16PF accompany with QSCC in order to classify the Sasang constitution, the accuracy rate of diagnosis showed inspiring elevation. For that reason, it seems to be desirable to introduce some items of 16PF into QSCC. According to above results, although QSCC contains several problems to be solved, its validity was proved, and the analysis of statistics suggests the possibility to step forward by introducing 16PF in some problems of promoting the accuracy of Sasang Constitutional diagnosis, assuring the objectibity of Sasang constitution classification and so on.

  • PDF

A Study on the Topical Associations of Simultaneously Borrowed Books in Public Libraries (공공도서관 동시 대출 도서의 주제 연관성 분석 연구)

  • Woojin Kang;In Yeong Jeong;Jongwook Lee
    • Journal of Korean Library and Information Science Society
    • /
    • v.54 no.3
    • /
    • pp.33-55
    • /
    • 2023
  • There has been research to understand users' information behaviors using book circulation data of public libraries. In this study, we examined the subject areas of books simultaneously borrowed by users of public libraries and aimed to identify the relationships among the subject areas. To accomplish this, we utilized the Korean Decimal Classification codes of 984,790 loaned books in 2019 to transform the lists of concurrently borrowed books, totaling 22,443,699 records, by the same users on the same day, into vectors using the ITEM2VEC technique. Next, we extracted ten highly related classification codes for each classification code, utilizing a total of 522 classification codes to create a network. We identified 15 communities within this network and examined the characteristics of each community. Among the 15 communities, those consisting of two or more main classes allowed us to identify meaningful thematic associations. This study, grounded in users' book usage behaviors, has suggested the topics of books that could be borrowed together. The findings offer valuable insights for library collection development and placement, recommending related subject materials, and revising classification systems.

Development of Classification Method for the Remote Sensing Digital Image Using Canonical Correlation Analysis (정준상관분석을 이용한 원격탐사 수치화상 분류기법의 개발 : 무감독분류기법과 정준상관분석의 통합 알고리즘)

  • Kim, Yong-Il;Kim, Dong-Hyun;Park, Min-Ho
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.4 no.2 s.8
    • /
    • pp.181-193
    • /
    • 1996
  • A new technique for land cover classification which applies digital image pre-classified by unsupervised classification technique, clustering, to Canonical Correlation Analysis(CCA) was proposed in this paper. Compared with maximum likelihood classification, the proposed technique had a good flexibility in selecting training areas. This implies that any selected position of training areas has few effects on classification results. Land cover of each cluster designated by CCA after clustering is able to be used as prior information for maximum likelihood classification. In case that the same training areas are used, accuracy of classification using Canonical Correlation Analysis after cluster analysis is better than that of maximum likelihood classification. Therefore, a new technique proposed in this study will be able to be put to practical use. Moreover this will play an important role in the construction of GIS database

  • PDF

Multiple SVM Classifier for Pattern Classification in Data Mining (데이터 마이닝에서 패턴 분류를 위한 다중 SVM 분류기)

  • Kim Man-Sun;Lee Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.15 no.3
    • /
    • pp.289-293
    • /
    • 2005
  • Pattern classification extracts various types of pattern information expressing objects in the real world and decides their class. The top priority of pattern classification technologies is to improve the performance of classification and, for this, many researches have tried various approaches for the last 40 years. Classification methods used in pattern classification include base classifier based on the probabilistic inference of patterns, decision tree, method based on distance function, neural network and clustering but they are not efficient in analyzing a large amount of multi-dimensional data. Thus, there are active researches on multiple classifier systems, which improve the performance of classification by combining problems using a number of mutually compensatory classifiers. The present study identifies problems in previous researches on multiple SVM classifiers, and proposes BORSE, a model that, based on 1:M policy in order to expand SVM to a multiple class classifier, regards each SVM output as a signal with non-linear pattern, trains the neural network for the pattern and combine the final results of classification performance.

CIRCLE ACTIONS ON ORIENTED MANIFOLDS WITH FEW FIXED POINTS

  • Jang, Donghoon
    • East Asian mathematical journal
    • /
    • v.36 no.5
    • /
    • pp.593-604
    • /
    • 2020
  • Let the circle act on a compact oriented manifold with a discrete fixed point set. At each fixed point, there are positive integers called weights, which describe the local action of S1 near the fixed point. In this paper, we provide the author's original proof that only uses the Atiyah-Singer index formula for the classification of the weights at the fixed points if the dimension of the manifold is 4 and there are at most 4 fixed points, which made the author possible to give a classification for any finite number of fixed points.

Neural Networks and Logistic Models for Classification: A Case Study

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.1
    • /
    • pp.13-19
    • /
    • 1996
  • In this paper, we study and compare two types of methods for classification when both continuous and categorical variables are used to describe each individual. One is neural network(NN) method using backpropagation learning(BPL). The other is logistic model(LM) method. Both the NN and LM are based on projections of the data in directions determined from interconnection weights.

  • PDF

Optimizing Intrusion Detection Pattern Model for Improving Network-based IDS Detection Efficiency

  • Kim, Jai-Myong;Lee, Kyu-Ho;Kim, Jong-Seob;Kim, Kuinam J.
    • Convergence Security Journal
    • /
    • v.1 no.1
    • /
    • pp.37-45
    • /
    • 2001
  • In this paper, separated and optimized pattern database model is proposed. In order to improve efficiency of Network-based IDS, pattern database is classified by proper basis. Classification basis is decided by the specific Intrusions validity on specific target. Using this model, IDS searches only valid patterns in pattern database on each captured packets. In result, IDS can reduce system resources for searching pattern database. So, IDS can analyze more packets on the network. In this paper, proper classification basis is proposed and pattern database classified by that basis is formed. And its performance is verified by experimental results.

  • PDF