• Title/Summary/Keyword: 비지도 학습.

Search Result 225, Processing Time 0.026 seconds

A new cluster validity index based on connectivity in self-organizing map (자기조직화지도에서 연결강도에 기반한 새로운 군집타당성지수)

  • Kim, Sangmin;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.5
    • /
    • pp.591-601
    • /
    • 2020
  • The self-organizing map (SOM) is a unsupervised learning method projecting high-dimensional data into low-dimensional nodes. It can visualize data in 2 or 3 dimensional space using the nodes and it is available to explore characteristics of data through the nodes. To understand the structure of data, cluster analysis is often used for nodes obtained from SOM. In cluster analysis, the optimal number of clusters is one of important issues. To help to determine it, various cluster validity indexes have been developed and they can be applied to clustering outcomes for nodes from SOM. However, while SOM has an advantage in that it reflects the topological properties of original data in the low-dimensional space, these indexes do not consider it. Thus, we propose a new cluster validity index for SOM based on connectivity between nodes which considers topological properties of data. The performance of the proposed index is evaluated through simulations and it is compared with various existing cluster validity indexes.

An Exploratory Study of Collective E-Petitions Estimation Methodology Using Anomaly Detection: Focusing on the Voice of Citizens of Changwon City (이상탐지 활용 전자집단민원 추정 방법론에 관한 탐색적 연구: 창원시 시민의 소리 사례를 중심으로)

  • Jeong, Ha-Yeong
    • Informatization Policy
    • /
    • v.26 no.4
    • /
    • pp.85-106
    • /
    • 2019
  • Recently, there have been increasing cases of collective petitions filed in the electronic petitions system. However, there is no efficient management system, raising concerns on side effects such as increased administrative workload and mass production of social conflicts. Aimed at suggesting a methodology for estimating electronic collective petitions using anomaly detection and corpus linguistics-based content analysis, this study conducted the followings: i) a theoretical review of the concept of collective petitions, ii) estimation of electronic collective petitions using anomaly detection based on nonparametric unsupervised learning, iii) a content similarity analysis on petitions using n-gram cosine angle distance, and iv) a case study on the Voice of Citizens of Changwon City, through which the utility of the proposed methodology, policy implications and future tasks were reviewed.

KR-WordRank : An Unsupervised Korean Word Extraction Method Based on WordRank (KR-WordRank : WordRank를 개선한 비지도학습 기반 한국어 단어 추출 방법)

  • Kim, Hyun-Joong;Cho, Sungzoon;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.18-33
    • /
    • 2014
  • A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.

Patterning Waterbirds Occurrences at the Western Costal Area of the Korean Peninsula in Winter Using a Self-organizing Map (인공신경회로망을 이용한 서해안 겨울철 수조류의 발생특성 유형화)

  • Park, Young-Seuk;Lee, Who-Seung;Nam, Hyung-Kyu;Lee, Ki-Sup;Yoo, Jeong-Chil
    • Korean Journal of Environmental Biology
    • /
    • v.25 no.2
    • /
    • pp.149-157
    • /
    • 2007
  • This study focused on patterning waterbirds occurrences at the western costal area of the Korean Peninsula in winter and relating the occurrence patterns with their environmental factors. Waterbird communities were monitored at 10 different study areas, and the composition of land cover as environmental factors was estimated at each study area. Overall dabbling ducks were the most abundant with 84% of total individuals, followed by shorebird and diving ducks. Species Anae platyrhynchos was the first dominant species, and Anas formosa was the second one. Self-organizing map (SOM), an unsupervised artificial neural network, was applied for patterning wintering waterbird communities, and identified 6 groups according to the differences of communities compositions. Each group reflected the differences of indicator species as well as their habitats.

Automatic Meeting Summary System using Enhanced TextRank Algorithm (향상된 TextRank 알고리즘을 이용한 자동 회의록 생성 시스템)

  • Bae, Young-Jun;Jang, Ho-Taek;Hong, Tae-Won;Lee, Hae-Yeoun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.467-474
    • /
    • 2018
  • To organize and document the contents of meetings and discussions is very important in various tasks. However, in the past, people had to manually organize the contents themselves. In this paper, we describe the development of a system that generates the meeting minutes automatically using the TextRank algorithm. The proposed system records all the utterances of the speaker in real time and calculates the similarity based on the appearance frequency of the sentences. Then, to create the meeting minutes, it extracts important words or phrases through a non-supervised learning algorithm for finding the relation between the sentences in the document data. Especially, we improved the performance by introducing the keyword weighting technique for the TextRank algorithm which reconfigured the PageRank algorithm to fit words and sentences.

Feature-based Gene Classification and Region Clustering using Gene Expression Grid Data in Mouse Hippocampal Region (쥐 해마의 유전자 발현 그리드 데이터를 이용한 특징기반 유전자 분류 및 영역 군집화)

  • Kang, Mi-Sun;Kim, HyeRyun;Lee, Sukchan;Kim, Myoung-Hee
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.54-60
    • /
    • 2016
  • Brain gene expression information is closely related to the structural and functional characteristics of the brain. Thus, extensive research has been carried out on the relationship between gene expression patterns and the brain's structural organization. In this study, Principal Component Analysis was used to extract features of gene expression patterns, and genes were automatically classified by spatial distribution. Voxels were then clustered with classified specific region expressed genes. Finally, we visualized the clustering results for mouse hippocampal region gene expression with the Allen Brain Atlas. This experiment allowed us to classify the region-specific gene expression of the mouse hippocampal region and provided visualization of clustering results and a brain atlas in an integrated manner. This study has the potential to allow neuroscientists to search for experimental groups of genes more quickly and design an effective test according to the new form of data. It is also expected that it will enable the discovery of a more specific sub-region beyond the current known anatomical regions of the brain.

Recognition of a New Car License Plate Using HSI Information, Fuzzy Binarization and ART2 Algorithm (HSI 정보와 퍼지 이진화 및 ART2 알고리즘을 이용한 신차량 번호판의 인식)

  • Kim, Kwang-Baek;Woo, Young-Woon;Park, Choong-Shik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.5
    • /
    • pp.1004-1012
    • /
    • 2007
  • In this paper, we proposed a new car license plate recognition method using an unsupervised ART2 algorithm with HSI color model. The proposed method consists of two main modules; extracting plate area from a vehicle image and recognizing the characters in the plate after that. To extract plate area, hue(H) component of HSI color model is used, and the sub-area containing characters is acquired using modified fuzzy binarization method. Each character is further divided by a 4-directional edge tracking algorithm. To recognize the separated characters, noise-robust ART2 algorithm is employed. When the proposed algorithm is applied to recognize license plate characters, the extraction rate is better than that of existing RGB model and the overall recognition rate is about 97.4%.

Analysis of Massive Scholarly Keywords using Inverted-Index based Bottom-up Clustering (역인덱스 기반 상향식 군집화 기법을 이용한 대규모 학술 핵심어 분석)

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.758-764
    • /
    • 2018
  • Digital documents such as patents, scholarly papers and research reports have author keywords which summarize the topics of documents. Different documents are likely to describe the same topic if they share the same keywords. Document clustering aims at clustering documents to similar topics with an unsupervised learning method. However, it is difficult to apply to a large amount of documents event though the document clustering is utilized to in various data analysis due to computational complexity. In this case, we can cluster and connect massive documents using keywords efficiently. Existing bottom-up hierarchical clustering requires huge computation and time complexity for clustering a large number of keywords. This paper proposes an inverted index based bottom-up clustering for keywords and analyzes the results of clustering with massive keywords extracted from scholarly papers and research reports.

Near Realtime Packet Classification & Handling Mechanism for Visualized Security Management in Cloud Environments (클라우드 환경에서 보안 가시성 확보를 위한 자동화된 패킷 분류 및 처리기법)

  • Ahn, Myong-ho;Ryoo, Mi-hyeon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.10a
    • /
    • pp.331-337
    • /
    • 2014
  • Paradigm shift to cloud computing has increased the importance of security. Even though public cloud computing providers such as Amazon, already provides security related service like firewall and identity management services, it is not suitable to protect data in cloud environments. Because in public cloud computing environments do not allow to use client's own security solution nor equipments. In this environments, user are supposed to do something to enhance security by their hands, so the needs of visualized security management arises. To implement visualized security management, developing near realtime data handling & packet classification mechanisms are crucial. The key technical challenges in packet classification is how to classify packet in the manner of unsupervised way without human interactions. To achieve the goal, this paper presents automated packet classification mechanism based on naive-bayesian and packet Chunking techniques, which can identify signature and does machine learning by itself without human intervention.

  • PDF

An Automatically Extracting Formal Information from Unstructured Security Intelligence Report (비정형 Security Intelligence Report의 정형 정보 자동 추출)

  • Hur, Yuna;Lee, Chanhee;Kim, Gyeongmin;Jo, Jaechoon;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.17 no.11
    • /
    • pp.233-240
    • /
    • 2019
  • In order to predict and respond to cyber attacks, a number of security companies quickly identify the methods, types and characteristics of attack techniques and are publishing Security Intelligence Reports(SIRs) on them. However, the SIRs distributed by each company are huge and unstructured. In this paper, we propose a framework that uses five analytic techniques to formulate a report and extract key information in order to reduce the time required to extract information on large unstructured SIRs efficiently. Since the SIRs data do not have the correct answer label, we propose four analysis techniques, Keyword Extraction, Topic Modeling, Summarization, and Document Similarity, through Unsupervised Learning. Finally, has built the data to extract threat information from SIRs, analysis applies to the Named Entity Recognition (NER) technology to recognize the words belonging to the IP, Domain/URL, Hash, Malware and determine if the word belongs to which type We propose a framework that applies a total of five analysis techniques, including technology.