• 제목/요약/키워드: Unsupervised

검색결과 822건 처리시간 0.023초

비감독 학습 기법에 의한 키워드 추출 (Keyword Extraction Using Unsupervised Learning Method)

  • 신성윤;백정욱;이양원
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국해양정보통신학회 2010년도 춘계학술대회
    • /
    • pp.165-166
    • /
    • 2010
  • 명사 추출이란 문서 내에 존재하는 모든 명사를 찾아내는 작업으로서, 한국어 정보검색에서는 문서를 대표하는 색인어 또는 키워드로서 명사를 사용한다. 본 논문에서는 기 구축된 사전을 이용하여 키워드를 추출하는 방법을 제시한다. 이 방법은 불필요한 연산을 줄여서 수행 시간을 단축시켰다. 그리고 대용량의 문서에서도 정확도에 크게 영향을 미치지 않으면서 명사를 추출할 수 있다. 본 논문에서는 명사의 출현 특성을 이용한 명사 추출 방법 및 비감독 학습 기법에 의한 키워드 추출 방법을 제시한다.

  • PDF

다중시기 위성영상의 무감독분류에 의한 갯벌의 입자 분포도 (Particulate Distribution Map of Tidal Flat using Unsupervised Classification of Multi-Temporary Satellite Data)

  • 정종철
    • 대한원격탐사학회지
    • /
    • 제18권2호
    • /
    • pp.71-79
    • /
    • 2002
  • 본 연구는 현장조사에서 얻어진 갯벌의 퇴적물 입자조성과 동일시기의 위성영상에서 추출된 반사치를 이용하여 함평만 갯벌의 입자분포도를 제시하였다. Landsat TM 자료에서 추출된 갯벌 입자조성에 따른 스팩트럼이 분석되었고, 7개의 위성영상은 ISODATA 와 K-MEANS 방법으로 분류되었다. 무감독분류된 결과는 현장관측치에 의해 분류 정확도가 평가되었으며, ISODATA와 K-MEANS 방법의 분류 정확도는 84.3%와 85.7%이다. 다중시기 위성영상 분류 결과를 검증하기 위해 현장조사 자료에 의해 분류된 1999년 5월 TM 영상을 참조자료로 하여 다중시기의 영상분류 결과를 비교하였다.

공간지역확장과 계층집단연결 기법을 이용한 무감독 영상분류 (Unsupervised Image Classification Using Spatial Region Growing Segmentation and Hierarchical Clustering)

  • 이상훈
    • 대한원격탐사학회지
    • /
    • 제17권1호
    • /
    • pp.57-69
    • /
    • 2001
  • 본 연구는 무감독 영상분류를 위하여 공간지역 확장을 통하여 영상을 분할한 후 분할된 집단을 한정된 수의 클래스로 분류하는 다중단계 기법을 제안하고 있다. 제안된 알고리듬은 무감독 분석을 위하여 작은 집단들을 단계적으로 큰 집단들로 합병해 가는 계층집단연결 기법에 기반을 두고 있다. 다중단계 기법의 영상분할 단계는 공간적으로 근접하고 있는 이웃지역간의 결합을 통하여 최종적으로 전체영상 공간내의 모든 집단에 대해서 서로 이웃하고 있는 집단들의 물리적 특성이 서로 다르도록 영상을 분할하는 과정이고, 영상분류 단계는 결합 지역의 공간적 제약 없이 영상 분할 단계에서 분할된 지역을 상대적으로 적은 수의 클래스로 분류하는 과정이다. 제안 된 알고리듬에서 사용하고 있는 계층집단연결 기법의 계산/기억 상의 복잡성을 완화시키기 위해 상호최근사 이웃쌍과 다중창 작업을 사용하고 있다. 모의 자료를 사용하여 제단 된 알고리듬 대한 평가와 효율성을 검증하였고 경기도 용인.능평지역의 LANDSAT ETM+ 자료에 적용한 결과를 예시하고 있다.

Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature

  • Bsoul, Qusay;Abdul Salam, Rosalina;Atwan, Jaffar;Jawarneh, Malik
    • Journal of Information Science Theory and Practice
    • /
    • 제9권4호
    • /
    • pp.15-34
    • /
    • 2021
  • Text clustering is one of the most commonly used methods for detecting themes or types of documents. Text clustering is used in many fields, but its effectiveness is still not sufficient to be used for the understanding of Arabic text, especially with respect to terms extraction, unsupervised feature selection, and clustering algorithms. In most cases, terms extraction focuses on nouns. Clustering simplifies the understanding of an Arabic text like the text of the Quran; it is important not only for Muslims but for all people who want to know more about Islam. This paper discusses the complexity and limitations of Arabic text clustering in the Quran based on their themes. Unsupervised feature selection does not consider the relationships between the selected features. One weakness of clustering algorithms is that the selection of the optimal initial centroid still depends on chances and manual settings. Consequently, this paper reviews literature about the three major stages of Arabic clustering: terms extraction, unsupervised feature selection, and clustering. Six experiments were conducted to demonstrate previously un-discussed problems related to the metrics used for feature selection and clustering. Suggestions to improve clustering of the Quran based on themes are presented and discussed.

한의학 고문헌 텍스트 분석을 위한 비지도학습 기반 단어 추출 방법 비교 (Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts)

  • 오준호
    • 대한한의학원전학회지
    • /
    • 제32권3호
    • /
    • pp.47-57
    • /
    • 2019
  • Objectives : We aim to assist in choosing an appropriate method for word extraction when analyzing East Asian Traditional Medical texts based on unsupervised learning. Methods : In order to assign ranks to substrings, we conducted a test using one method(BE:Branching Entropy) for exterior boundary value, three methods(CS:cohesion score, TS:t-score, SL:simple-ll) for interior boundary value, and six methods(BExSL, BExTS, BExCS, CSxTS, CSxSL, TSxSL) from combining them. Results : When Miss Rate(MR) was used as the criterion, the error was minimal when the TS and SL were used together, while the error was maximum when CS was used alone. When number of segmented texts was applied as weight value, the results were the best in the case of SL, and the worst in the case of BE alone. Conclusions : Unsupervised-Learning-Based Word Extraction is a method that can be used to analyze texts without a prepared set of vocabulary data. When using this method, SL or the combination of SL and TS could be considered primarily.

Unsupervised Outpatients Clustering: A Case Study in Avissawella Base Hospital, Sri Lanka

  • Hoang, Huu-Trung;Pham, Quoc-Viet;Kim, Jung Eon;Kim, Hoon;Park, Junseok;Hwang, Won-Joo
    • 한국멀티미디어학회논문지
    • /
    • 제22권4호
    • /
    • pp.480-490
    • /
    • 2019
  • Nowadays, Electronic Medical Record (EMR) has just implemented at few hospitals for Outpatient Department (OPD). OPD is the diversified data, it includes demographic and diseases of patient, so it need to be clustered in order to explore the hidden rules and the relationship of data types of patient's information. In this paper, we propose a novel approach for unsupervised clustering of patient's demographic and diseases in OPD. Firstly, we collect data from a hospital at OPD. Then, we preprocess and transform data by using powerful techniques such as standardization, label encoder, and categorical encoder. After obtaining transformed data, we use some strong experiments, techniques, and evaluation to select the best number of clusters and best clustering algorithm. In addition, we use some tests and measurements to analyze and evaluate cluster tendency, models, and algorithms. Finally, we obtain the results to analyze and discover new knowledge, meanings, and rules. Clusters that are found out in this research provide knowledge to medical managers and doctors. From these information, they can improve the patient management methods, patient arrangement methods, and doctor's ability. In addition, it is a reference for medical data scientist to mine OPD dataset.

무감독분류 기법에 의한 부분방전 데이터 분석 (Partial Discharge Data Analysis with Unsupervised Classification)

  • 조경순;홍선학
    • 디지털산업정보학회논문지
    • /
    • 제14권4호
    • /
    • pp.9-16
    • /
    • 2018
  • This study described partial discharge(PD) distribution analysis between the XLPE(Cross-Linked PolyEthylene)and EPDM(Ethylene Propylene Diene Monomer) interface with unsupervised classification. The ${\phi}-q-n$ patterns were analyzed using phase resolved partial discharge(PRPD). K-means cluster analysis forms a cluster based on similarities and distances among scattered individuals, and analyzes the characteristics of the formed clusters, dividing the multivariate data into several groups according to the similarity of each characteristic, Is a statistical analysis that makes it easier to navigate. It was confirmed that the phase angle of the cluster with the maximum discharge charge was concentrated around $0^{\circ}$ and $180^{\circ}$ at 30 kV after the initial phase distribution localized around $90^{\circ}$ and $300^{\circ}$ expanded to the whole phase angle according to the voltage rise. The Euclidean distance between the center of gravity and the discharge charge in the ${\Phi}-q$ cluster increased with increasing applied voltage.

Decision support system for underground coal pillar stability using unsupervised and supervised machine learning approaches

  • Kamran, Muhammad;Shahani, Niaz Muhammad;Armaghani, Danial Jahed
    • Geomechanics and Engineering
    • /
    • 제30권2호
    • /
    • pp.107-121
    • /
    • 2022
  • Coal pillar assessment is of broad importance to underground engineering structure, as the pillar failure can lead to enormous disasters. Because of the highly non-linear correlation between the pillar failure and its influential attributes, conventional forecasting techniques cannot generate accurate outcomes. To approximate the complex behavior of coal pillar, this paper elucidates a new idea to forecast the underground coal pillar stability using combined unsupervised-supervised learning. In order to build a database of the study, a total of 90 patterns of pillar cases were collected from authentic engineering structures. A state-of-the art feature depletion method, t-distribution symmetric neighbor embedding (t-SNE) has been employed to reduce significance of actual data features. Consequently, an unsupervised machine learning technique K-mean clustering was followed to reassign the t-SNE dimensionality reduced data in order to compute the relative class of coal pillar cases. Following that, the reassign dataset was divided into two parts: 70 percent for training dataset and 30 percent for testing dataset, respectively. The accuracy of the predicted data was then examined using support vector classifier (SVC) model performance measures such as precision, recall, and f1-score. As a result, the proposed model can be employed for properly predicting the pillar failure class in a variety of underground rock engineering projects.

Vibration-based structural health monitoring using CAE-aided unsupervised deep learning

  • Minte, Zhang;Tong, Guo;Ruizhao, Zhu;Yueran, Zong;Zhihong, Pan
    • Smart Structures and Systems
    • /
    • 제30권6호
    • /
    • pp.557-569
    • /
    • 2022
  • Vibration-based structural health monitoring (SHM) is crucial for the dynamic maintenance of civil building structures to protect property security and the lives of the public. Analyzing these vibrations with modern artificial intelligence and deep learning (DL) methods is a new trend. This paper proposed an unsupervised deep learning method based on a convolutional autoencoder (CAE), which can overcome the limitations of conventional supervised deep learning. With the convolutional core applied to the DL network, the method can extract features self-adaptively and efficiently. The effectiveness of the method in detecting damage is then tested using a benchmark model. Thereafter, this method is used to detect damage and instant disaster events in a rubber bearing-isolated gymnasium structure. The results indicate that the method enables the CAE network to learn the intact vibrations, so as to distinguish between different damage states of the benchmark model, and the outcome meets the high-dimensional data distribution characteristics visualized by the t-SNE method. Besides, the CAE-based network trained with daily vibrations of the isolating layer in the gymnasium can precisely recover newly collected vibration and detect the occurrence of the ground motion. The proposed method is effective at identifying nonlinear variations in the dynamic responses and has the potential to be used for structural condition assessment and safety warning.

비교사 토론 인덱싱을 위한 시청각 콘텐츠 분석 기반 클러스터링 (Audio-Visual Content Analysis Based Clustering for Unsupervised Debate Indexing)

  • 금지수;이현수
    • 한국음향학회지
    • /
    • 제27권5호
    • /
    • pp.244-251
    • /
    • 2008
  • 본 연구에서는 시청각 정보를 이용한 비교사 토론 인덱싱 방법을 제안한다. 제안하는 방법은 BIC (Bayesian Information Criterion)에 의한 음성 클러스터링 결과와 거리기반 함수에 의한 영상 클러스터링 결과를 결합한다. 시청각 정보의 결합은 음성 또는 영상 정보를 개별적으로 사용하여 클러스터링할 때 나타나는 문제점을 줄일 수 있고, 토론 데이터의 효과적인 내용 기반의 분석이 가능하다. 제안하는 방법의 성능 평가를 위해 서로 다른 5종류의 토론 데이터에 대해 음성, 영상 정보를 개별적으로 사용할 때와 두 가지 정보를 동시에 사용할 때의 성능 평가를 수행하였다. 실험 결과 음성과 영상 정보를 결합한 방법이 음성, 영상 정보를 개별적으로 사용할 때 보다 토론 인덱싱에 효과적임을 확인하였다.