• Title/Summary/Keyword: Unsupervised machine learning.

Search Result 139, Processing Time 0.028 seconds

Construction of Linearly Aliened Corpus Using Unsupervised Learning (자율 학습을 이용한 선형 정렬 말뭉치 구축)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.11B no.3
    • /
    • pp.387-394
    • /
    • 2004
  • In this paper, we propose a modified unsupervised linear alignment algorithm for building an aligned corpus. The original algorithm inserts null characters into both of two aligned strings (source string and target string), because the two strings are different from each other in length. This can cause some difficulties like the search space explosion for applications using the aligned corpus with null characters and no possibility of applying to several machine learning algorithms. To alleviate these difficulties, we modify the algorithm not to contain null characters in the aligned source strings. We have shown the usability of our approach by applying it to different areas such as Korean-English back-trans literation, English grapheme-phoneme conversion, and Korean morphological analysis.

A Study on Atmospheric Data Anomaly Detection Algorithm based on Unsupervised Learning Using Adversarial Generative Neural Network (적대적 생성 신경망을 활용한 비지도 학습 기반의 대기 자료 이상 탐지 알고리즘 연구)

  • Yang, Ho-Jun;Lee, Seon-Woo;Lee, Mun-Hyung;Kim, Jong-Gu;Choi, Jung-Mu;Shin, Yu-mi;Lee, Seok-Chae;Kwon, Jang-Woo;Park, Ji-Hoon;Jung, Dong-Hee;Shin, Hye-Jung
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.4
    • /
    • pp.260-269
    • /
    • 2022
  • In this paper, We propose an anomaly detection model using deep neural network to automate the identification of outliers of the national air pollution measurement network data that is previously performed by experts. We generated training data by analyzing missing values and outliers of weather data provided by the Institute of Environmental Research and based on the BeatGAN model of the unsupervised learning method, we propose a new model by changing the kernel structure, adding the convolutional filter layer and the transposed convolutional filter layer to improve anomaly detection performance. In addition, by utilizing the generative features of the proposed model to implement and apply a retraining algorithm that generates new data and uses it for training, it was confirmed that the proposed model had the highest performance compared to the original BeatGAN models and other unsupervised learning model like Iforest and One Class SVM. Through this study, it was possible to suggest a method to improve the anomaly detection performance of proposed model while avoiding overfitting without additional cost in situations where training data are insufficient due to various factors such as sensor abnormalities and inspections in actual industrial sites.

Opera Clustering: K-means on librettos datasets

  • Jeong, Harim;Yoo, Joo Hun
    • Journal of Internet Computing and Services
    • /
    • v.23 no.2
    • /
    • pp.45-52
    • /
    • 2022
  • With the development of artificial intelligence analysis methods, especially machine learning, various fields are widely expanding their application ranges. However, in the case of classical music, there still remain some difficulties in applying machine learning techniques. Genre classification or music recommendation systems generated by deep learning algorithms are actively used in general music, but not in classical music. In this paper, we attempted to classify opera among classical music. To this end, an experiment was conducted to determine which criteria are most suitable among, composer, period of composition, and emotional atmosphere, which are the basic features of music. To generate emotional labels, we adopted zero-shot classification with four basic emotions, 'happiness', 'sadness', 'anger', and 'fear.' After embedding the opera libretto with the doc2vec processing model, the optimal number of clusters is computed based on the result of the elbow method. Decided four centroids are then adopted in k-means clustering to classify unsupervised libretto datasets. We were able to get optimized clustering based on the result of adjusted rand index scores. With these results, we compared them with notated variables of music. As a result, it was confirmed that the four clusterings calculated by machine after training were most similar to the grouping result by period. Additionally, we were able to verify that the emotional similarity between composer and period did not appear significantly. At the end of the study, by knowing the period is the right criteria, we hope that it makes easier for music listeners to find music that suits their tastes.

A Machine learning Approach for Knowledge Base Construction Incorporating GIS Data for land Cover Classification of Landsat ETM+ Image (지식 기반 시스템에서 GIS 자료를 활용하기 위한 기계 학습 기법에 관한 연구 - Landsat ETM+ 영상의 토지 피복 분류를 사례로)

  • Kim, Hwa-Hwan;Ku, Cha-Yang
    • Journal of the Korean Geographical Society
    • /
    • v.43 no.5
    • /
    • pp.761-774
    • /
    • 2008
  • Integration of GIS data and human expert knowledge into digital image processing has long been acknowledged as a necessity to improve remote sensing image analysis. We propose inductive machine learning algorithm for GIS data integration and rule-based classification method for land cover classification. Proposed method is tested with a land cover classification of a Landsat ETM+ multispectral image and GIS data layers including elevation, aspect, slope, distance to water bodies, distance to road network, and population density. Decision trees and production rules for land cover classification are generated by C5.0 inductive machine learning algorithm with 350 stratified random point samples. Production rules are used for land cover classification integrated with unsupervised ISODATA classification. Result shows that GIS data layers such as elevation, distance to water bodies and population density can be effectively integrated for rule-based image classification. Intuitive production rules generated by inductive machine learning are easy to understand. Proposed method demonstrates how various GIS data layers can be integrated with remotely sensed imagery in a framework of knowledge base construction to improve land cover classification.

Application of Machine Learning Techniques for Resolving Korean Author Names (한글 저자명 중의성 해소를 위한 기계학습기법의 적용)

  • Kang, In-Su
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.3
    • /
    • pp.27-39
    • /
    • 2008
  • In bibliographic data, the use of personal names to indicate authors makes it difficult to specify a particular author since there are numerous authors whose personal names are the same. Resolving same-name author instances into different individuals is called author resolution, which consists of two steps: calculating author similarities and then clustering same-name author instances into different person groups. Author similarities are computed from similarities of author-related bibliographic features such as coauthors, titles of papers, publication information, using supervised or unsupervised methods. Supervised approaches employ machine learning techniques to automatically learn the author similarity function from author-resolved training samples. So far however, a few machine learning methods have been investigated for author resolution. This paper provides a comparative evaluation of a variety of recent high-performing machine learning techniques on author disambiguation, and compares several methods of processing author disambiguation features such as coauthors and titles of papers.

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.

Lung Cancer Classification and Detection Using Deep Learning Technique

  • K.Sudha Rani;A.Suma Latha;S.Sunitha Ratnam;J.Bhavani;J.Srinivasa Rao;N.Kavitha Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.10
    • /
    • pp.81-90
    • /
    • 2024
  • Lung cancer is a complex and frightening disease that typically results in death in both men and women. Therefore, it is more crucial to thoroughly and swiftly evaluate the malignant nodules. Recent years have seen the development of numerous strategies for diagnosing lung cancer, most of which use CT imaging. These techniques include supervisory and non-supervisory procedures. This study revealed that computed tomography scans are more suitable for obtaining reliable results. Lung cancer cannot be accurately predicted using unsupervised approaches. As a result, supervisory techniques are crucial in lung cancer prediction. Convolutional neural networks (CNNs) based on deep learning techniques has been used in this paper. Convolutional neural networks (CNN)-based deep learning procedures have produced results that are more precise than those produced by traditional machine learning procedures. A number of statistical measures, including accuracy, precision, and f1, have been computed.

Design of Hybrid Unsupervised-Supervised Classifier for Automatic Emotion Recognition (자동 감성 인식을 위한 비교사-교사 분류기의 복합 설계)

  • Lee, JeeEun;Yoo, Sun K.
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.9
    • /
    • pp.1294-1299
    • /
    • 2014
  • The emotion is deeply affected by human behavior and cognitive process, so it is important to do research about the emotion. However, the emotion is ambiguous to clarify because of different ways of life pattern depending on each individual characteristics. To solve this problem, we use not only physiological signal for objective analysis but also hybrid unsupervised-supervised learning classifier for automatic emotion detection. The hybrid emotion classifier is composed of K-means, genetic algorithm and support vector machine. We acquire four different kinds of physiological signal including electroencephalography(EEG), electrocardiography(ECG), galvanic skin response(GSR) and skin temperature(SKT) as well as we use 15 features extracted to be used for hybrid emotion classifier. As a result, hybrid emotion classifier(80.6%) shows better performance than SVM(31.3%).

Medical Image Analysis Using Artificial Intelligence

  • Yoon, Hyun Jin;Jeong, Young Jin;Kang, Hyun;Jeong, Ji Eun;Kang, Do-Young
    • Progress in Medical Physics
    • /
    • v.30 no.2
    • /
    • pp.49-58
    • /
    • 2019
  • Purpose: Automated analytical systems have begun to emerge as a database system that enables the scanning of medical images to be performed on computers and the construction of big data. Deep-learning artificial intelligence (AI) architectures have been developed and applied to medical images, making high-precision diagnosis possible. Materials and Methods: For diagnosis, the medical images need to be labeled and standardized. After pre-processing the data and entering them into the deep-learning architecture, the final diagnosis results can be obtained quickly and accurately. To solve the problem of overfitting because of an insufficient amount of labeled data, data augmentation is performed through rotation, using left and right flips to artificially increase the amount of data. Because various deep-learning architectures have been developed and publicized over the past few years, the results of the diagnosis can be obtained by entering a medical image. Results: Classification and regression are performed by a supervised machine-learning method and clustering and generation are performed by an unsupervised machine-learning method. When the convolutional neural network (CNN) method is applied to the deep-learning layer, feature extraction can be used to classify diseases very efficiently and thus to diagnose various diseases. Conclusions: AI, using a deep-learning architecture, has expertise in medical image analysis of the nerves, retina, lungs, digital pathology, breast, heart, abdomen, and musculo-skeletal system.

Network Intrusion Detection System Using Feature Extraction Based on AutoEncoder in IOT environment (IOT 환경에서의 오토인코더 기반 특징 추출을 이용한 네트워크 침입탐지 시스템)

  • Lee, Joohwa;Park, Keehyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.12
    • /
    • pp.483-490
    • /
    • 2019
  • In the Network Intrusion Detection System (NIDS), the function of classification is very important, and detection performance depends on various features. Recently, a lot of research has been carried out on deep learning, but network intrusion detection system experience slowing down problems due to the large volume of traffic and a high dimensional features. Therefore, we do not use deep learning as a classification, but as a preprocessing process for feature extraction and propose a research method from which classifications can be made based on extracted features. A stacked AutoEncoder, which is a representative unsupervised learning of deep learning, is used to extract features and classifications using the Random Forest classification algorithm. Using the data collected in the IOT environment, the performance was more than 99% when normal and attack traffic are classified into multiclass, and the performance and detection rate were superior even when compared with other models such as AE-RF and Single-RF.