• Title/Summary/Keyword: supervised and unsupervised classification

Search Result 100, Processing Time 0.029 seconds

A Study on the Classification of Variables Affecting Smartphone Addiction in Decision Tree Environment Using Python Program

  • Kim, Seung-Jae
    • International journal of advanced smart convergence
    • /
    • v.11 no.4
    • /
    • pp.68-80
    • /
    • 2022
  • Since the launch of AI, technology development to implement complete and sophisticated AI functions has continued. In efforts to develop technologies for complete automation, Machine Learning techniques and deep learning techniques are mainly used. These techniques deal with supervised learning, unsupervised learning, and reinforcement learning as internal technical elements, and use the Big-data Analysis method again to set the cornerstone for decision-making. In addition, established decision-making is being improved through subsequent repetition and renewal of decision-making standards. In other words, big data analysis, which enables data classification and recognition/recognition, is important enough to be called a key technical element of AI function. Therefore, big data analysis itself is important and requires sophisticated analysis. In this study, among various tools that can analyze big data, we will use a Python program to find out what variables can affect addiction according to smartphone use in a decision tree environment. We the Python program checks whether data classification by decision tree shows the same performance as other tools, and sees if it can give reliability to decision-making about the addictiveness of smartphone use. Through the results of this study, it can be seen that there is no problem in performing big data analysis using any of the various statistical tools such as Python and R when analyzing big data.

A Comparative Study on Discretization Algorithms for Data Mining (데이터 마이닝을 위한 이산화 알고리즘에 대한 비교 연구)

  • Choi, Byong-Su;Kim, Hyun-Ji;Cha, Woon-Ock
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.89-102
    • /
    • 2011
  • The discretization process that converts continuous attributes into discrete ones is a preprocessing step in data mining such as classification. Some classification algorithms can handle only discrete attributes. The purpose of discretization is to obtain discretized data without losing the information for the original data and to obtain a high predictive accuracy when discretized data are used in classification. Many discretization algorithms have been developed. This paper presents the results of our comparative study on recently proposed representative discretization algorithms from the view point of splitting versus merging and supervised versus unsupervised. We implemented R codes for discretization algorithms and made them available for public users.

Cell Images Classification using Deep Convolutional Autoencoder of Unsupervised Learning (비지도학습의 딥 컨벌루셔널 자동 인코더를 이용한 셀 이미지 분류)

  • Vununu, Caleb;Park, Jin-Hyeok;Kwon, Oh-Jun;Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Annual Conference of KIPS
    • /
    • 2021.11a
    • /
    • pp.942-943
    • /
    • 2021
  • The present work proposes a classification system for the HEp-2 cell images using an unsupervised deep feature learning method. Unlike most of the state-of-the-art methods in the literature that utilize deep learning in a strictly supervised way, we propose here the use of the deep convolutional autoencoder (DCAE) as the principal feature extractor for classifying the different types of the HEp-2 cell images. The network takes the original cell images as the inputs and learns to reconstruct them in order to capture the features related to the global shape of the cells. A final feature vector is constructed by using the latent representations extracted from the DCAE, giving a highly discriminative feature representation. The created features will be fed to a nonlinear classifier whose output will represent the final type of the cell image. We have tested the discriminability of the proposed features on one of the most popular HEp-2 cell classification datasets, the SNPHEp-2 dataset and the results show that the proposed features manage to capture the distinctive characteristics of the different cell types while performing at least as well as the actual deep learning based state-of-the-art methods.

Monitoring Spatiotemporal Changes of Tidal Flats in Go-Gunsan Islands by Environmental Factors using Satellite Images (위성영상을 활용한 환경 요인에 따른 고군산 군도 간석지의 시공간적 변화 탐지)

  • Lee, Hong-Ro;Lee, Jae-Bong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.8 no.3
    • /
    • pp.34-43
    • /
    • 2005
  • We will catch the spatio-temporal changes that analyse the unknown topography of Go-Gunsan Islands using Landsat TM satellite images into an unsupervised ISODATA classification and a supervised nearest likelihood classification. Each sedimental topography has the different characteristics according to building the Saemangeum embarkment. We will deal with the distribution of sedimental shapes using ERDAS Imagine 8. 6. The result that classifies specifically topographic properties of our research area be considered to get use of establishing the reclaiming program and predicating the reclaimed sedimental topography. The research area can be classified into tidal flats and sea level using band 4 among 7 bands of Landsat TM. Also band 5 can be used to classify the special unknown shapes of tidal flats. We will clarify the efficiency that spatio-temporal sedimental changes can be extracted through processing satellite images. Therefore, the spatio-temporal unknown topography change monitoring using satellite images is expected to be very useful to clarify whether the tidal flat is generated or not in the Go-Gunsan Islands at the outer side of the embarkment after constructing completely the Saemangeum tidal embarkment.

  • PDF

Noun and Keyword Extraction for Information Processing of Korean (한국어 정보처리를 위한 명사 및 키워드 추출)

  • Shin, Seong-Yoon;Rhee, Yang-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.51-56
    • /
    • 2009
  • In a language, noun and keyword extraction is a key element in information processing. When it comes to processing Korean language information, however, there are still a lot of problems with noun and keyword extraction. This paper proposes an effective noun extraction method that considers noun emergence features. The proposed method can be effectively used in areas like information retrieval where large volumes of documents and data need to be processed in a fast manner. In this paper, a category-based keyword construction method is also presented that uses an unsupervised learning technique to ensure high volumes of queries are automatically classified. Our experimental results show that the proposed method outperformed both the supervised learning-based X2 method known to excel in keyword extraction and the DF method, in terms o classification precision.

Artificial Intelligence for Clinical Research in Voice Disease (후두음성 질환에 대한 인공지능 연구)

  • Jungirl, Seok;Tack-Kyun, Kwon
    • Journal of the Korean Society of Laryngology, Phoniatrics and Logopedics
    • /
    • v.33 no.3
    • /
    • pp.142-155
    • /
    • 2022
  • Diagnosis using voice is non-invasive and can be implemented through various voice recording devices; therefore, it can be used as a screening or diagnostic assistant tool for laryngeal voice disease to help clinicians. The development of artificial intelligence algorithms, such as machine learning, led by the latest deep learning technology, began with a binary classification that distinguishes normal and pathological voices; consequently, it has contributed in improving the accuracy of multi-classification to classify various types of pathological voices. However, no conclusions that can be applied in the clinical field have yet been achieved. Most studies on pathological speech classification using speech have used the continuous short vowel /ah/, which is relatively easier than using continuous or running speech. However, continuous speech has the potential to derive more accurate results as additional information can be obtained from the change in the voice signal over time. In this review, explanations of terms related to artificial intelligence research, and the latest trends in machine learning and deep learning algorithms are reviewed; furthermore, the latest research results and limitations are introduced to provide future directions for researchers.

Comparative Study of Tokenizer Based on Learning for Sentiment Analysis (고객 감성 분석을 위한 학습 기반 토크나이저 비교 연구)

  • Kim, Wonjoon
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.3
    • /
    • pp.421-431
    • /
    • 2020
  • Purpose: The purpose of this study is to compare and analyze the tokenizer in natural language processing for customer satisfaction in sentiment analysis. Methods: In this study, a supervised learning-based tokenizer Mecab-Ko and an unsupervised learning-based tokenizer SentencePiece were used for comparison. Three algorithms: Naïve Bayes, k-Nearest Neighbor, and Decision Tree were selected to compare the performance of each tokenizer. For performance comparison, three metrics: accuracy, precision, and recall were used in the study. Results: The results of this study are as follows; Through performance evaluation and verification, it was confirmed that SentencePiece shows better classification performance than Mecab-Ko. In order to confirm the robustness of the derived results, independent t-tests were conducted on the evaluation results for the two types of the tokenizer. As a result of the study, it was confirmed that the classification performance of the SentencePiece tokenizer was high in the k-Nearest Neighbor and Decision Tree algorithms. In addition, the Decision Tree showed slightly higher accuracy among the three classification algorithms. Conclusion: The SentencePiece tokenizer can be used to classify and interpret customer sentiment based on online reviews in Korean more accurately. In addition, it seems that it is possible to give a specific meaning to a short word or a jargon, which is often used by users when evaluating products but is not defined in advance.

Detection of Red Tide Patches using AVHRR and Landsat TM data (AVHRR과 Landsat TM 자료를 이용한 적조 패취 관측)

  • Jeong, Jong-Chul
    • Journal of Environmental Impact Assessment
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2001
  • Detection of red tides by satellite remote sensing can be done either by detecting enhanced level of chlorophyll pigment or by detecting changes in the spectral composition of pixels. Using chlorophyll concentration, however, is not effective currently due to the facts: 1) Chlorophyll-a is a universal pigment of phytoplankton, and 2) no accurate algorithm for chlorophyll in case 2 water is available yet. Here, red band algorithm, classification and PCA (Principal Component Analysis) techniques were applied for detecting patches of Cochlodinium polykrikoides red tides which occurred in Korean waters in 1995. This dinoflagellate species appears dark red due to the characteristic pigments absorbing lights in the blue and green wavelength most effectively. In the satellite image, the brightness of red tide pixels in all the three visible bands were low making the detection difficult. Red band algorithm is not good for detecting the red tide because of reflectance of suspended sediments. For supervised classification, selecting training area was difficult, while unsupervised classification was not effective in delineating the patches from surrounding pixels. On the other hand, PCA gave a good qualitative discrimination on the distribution compared with actual observation.

  • PDF

Review of Land Cover Classification Potential in River Spaces Using Satellite Imagery and Deep Learning-Based Image Training Method (딥 러닝 기반 이미지 트레이닝을 활용한 하천 공간 내 피복 분류 가능성 검토)

  • Woochul, Kang;Eun-kyung, Jang
    • Ecology and Resilient Infrastructure
    • /
    • v.9 no.4
    • /
    • pp.218-227
    • /
    • 2022
  • This study attempted classification through deep learning-based image training for land cover classification in river spaces which is one of the important data for efficient river management. For this purpose, land cover classification analysis with the RGB image of the target section based on the category classification index of major land cover map was conducted by using the learning outcomes from the result of labeling. In addition, land cover classification of the river spaces was performed by unsupervised and supervised classification from Sentinel-2 satellite images provided in an open format, and this was compared with the results of deep learning-based image classification. As a result of the analysis, it showed more accurate prediction results compared to unsupervised classification results, and it presented significantly improved classification results in the case of high-resolution images. The result of this study showed the possibility of classifying water areas and wetlands in the river spaces, and if additional research is performed in the future, the deep learning based image train method for the land cover classification could be used for river management.

Emerging Machine Learning in Wearable Healthcare Sensors

  • Gandha Satria Adi;Inkyu Park
    • Journal of Sensor Science and Technology
    • /
    • v.32 no.6
    • /
    • pp.378-385
    • /
    • 2023
  • Human biosignals provide essential information for diagnosing diseases such as dementia and Parkinson's disease. Owing to the shortcomings of current clinical assessments, noninvasive solutions are required. Machine learning (ML) on wearable sensor data is a promising method for the real-time monitoring and early detection of abnormalities. ML facilitates disease identification, severity measurement, and remote rehabilitation by providing continuous feedback. In the context of wearable sensor technology, ML involves training on observed data for tasks such as classification and regression with applications in clinical metrics. Although supervised ML presents challenges in clinical settings, unsupervised learning, which focuses on tasks such as cluster identification and anomaly detection, has emerged as a useful alternative. This review examines and discusses a variety of ML algorithms such as Support Vector Machines (SVM), Random Forests (RF), Decision Trees (DT), Neural Networks (NN), and Deep Learning for the analysis of complex clinical data.