• Title/Summary/Keyword: NB (Naive Bayes)

Search Result 25, Processing Time 0.027 seconds

Morpheme Recovery Based on Naïve Bayes Model (NB 모델을 이용한 형태소 복원)

  • Kim, Jae-Hoon;Jeon, Kil-Ho
    • The KIPS Transactions:PartB
    • /
    • v.19B no.3
    • /
    • pp.195-200
    • /
    • 2012
  • In Korean, spelling change in various forms must be recovered into base forms in morphological analysis as well as part-of-speech (POS) tagging is difficult without morphological analysis because Korean is agglutinative. This is one of notorious problems in Korean morphological analysis and has been solved by morpheme recovery rules, which generate morphological ambiguity resolved by POS tagging. In this paper, we propose a morpheme recovery scheme based on machine learning methods like Na$\ddot{i}$ve Bayes models. Input features of the models are the surrounding context of the syllable which the spelling change is occurred and categories of the models are the recovered syllables. The POS tagging system with the proposed model has demonstrated the $F_1$-score of 97.5% for the ETRI tree-tagged corpus. Thus it can be decided that the proposed model is very useful to handle morpheme recovery in Korean.

Multi-class Cancer Classification by Integrating OVR SVMs based on Subsumption Architecture (포섭 구조기반 OVR SVM 결합을 통한 다중부류 암 분류)

  • Hong Jin-Hyuk;Cho Sung-Bae
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.37-39
    • /
    • 2006
  • 지지 벡터 기계(Support Vector Machine; SVM)는 기본적으로 이진분류를 위해 고안되었지만, 최근 다양한 분류기 생성전략과 결합전략이 고안되어 다중부류 분류에도 적용되고 있다. 본 논문에서는 OVR(One-Vs-Rest) 전략으로 생성된 SVM을 NB(Naive Bayes) 분류기를 이용하여 동적으로 구성함으로써, OVR SVM을 이용한 다중부류 분류 시스템에서 자주 발생하는 동점을 효과적으로 해결하는 방법은 제안한다. 이 방법을 유전발현 데이터를 이용한 다중부류 암 분류에 적용하였는데, 고차원의 데이터로부터 NB 분류기 구축에 유용한 유전자를 선택하기 위해 Pearson 상관계수를 사용하였다. 14개의 암 유형과 16,063개의 유전발현 수준을 가지는 대표적인 다중부류 암 분류 데이터인 GCM 암 데이터에 적용하여 제안하는 방법의 유용성을 확인하였다.

  • PDF

Prediction of Delivery Quality Assurance Via Machine Learning in Helical Tomotherapy (방사선치료 시 다양한 기계학습을 이용한 선량품질관리 결과의 예측)

  • Kyung Hwan Chang
    • Journal of radiological science and technology
    • /
    • v.47 no.4
    • /
    • pp.263-270
    • /
    • 2024
  • The objective of this study was to evaluate the accuracy and impact of leaf open time (LOT) and pitch using various machine learning models on EBT film-based delivery quality assurance (DQA) performed on 211 patients of helical tomotherapy (HT). We randomly selected passed (n=191) and failed (n=20) DQA measurements to evaluate the accuracy of the k-nearest neighbor (KNN), support vector machine (SVM), naive Bayes (NB) and logistic regression (LR) models using scale-dependent metrics such as the coefficient of determination (R2), mean squared error (MSE), and root MSE (RMSE). We evaluated the performance of the four prediction models in terms of the accuracy, precision, sensitivity, and F1-score using a confusion matrix, finding the NB and LR models to achieve optimal results. The results of this study are expected to reduce the workload of medical physicists and dosimetrists by predicting DQA results according to LOT and pitch in advance.

Study of oversampling algorithms for soil classifications by field velocity resistivity probe

  • Lee, Jong-Sub;Park, Junghee;Kim, Jongchan;Yoon, Hyung-Koo
    • Geomechanics and Engineering
    • /
    • v.30 no.3
    • /
    • pp.247-258
    • /
    • 2022
  • A field velocity resistivity probe (FVRP) can measure compressional waves, shear waves and electrical resistivity in boreholes. The objective of this study is to perform the soil classification through a machine learning technique through elastic wave velocity and electrical resistivity measured by FVRP. Field and laboratory tests are performed, and the measured values are used as input variables to classify silt sand, sand, silty clay, and clay-sand mixture layers. The accuracy of k-nearest neighbors (KNN), naive Bayes (NB), random forest (RF), and support vector machine (SVM), selected to perform classification and optimize the hyperparameters, is evaluated. The accuracies are calculated as 0.76, 0.91, 0.94, and 0.88 for KNN, NB, RF, and SVM algorithms, respectively. To increase the amount of data at each soil layer, the synthetic minority oversampling technique (SMOTE) and conditional tabular generative adversarial network (CTGAN) are applied to overcome imbalance in the dataset. The CTGAN provides improved accuracy in the KNN, NB, RF and SVM algorithms. The results demonstrate that the measured values by FVRP can classify soil layers through three kinds of data with machine learning algorithms.

Implementation of a Machine Learning-based Recommender System for Preventing the University Students' Dropout (대학생 중도탈락 예방을 위한 기계 학습 기반 추천 시스템 구현 방안)

  • Jeong, Do-Heon
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.10
    • /
    • pp.37-43
    • /
    • 2021
  • This study proposed an effective automatic classification technique to identify dropout patterns of university students, and based on this, an intelligent recommender system to prevent dropouts. To this end, 1) a data processing method to improve the performance of machine learning was proposed based on actual enrollment/dropout data of university students, and 2) performance comparison experiments were conducted using five types of machine learning algorithms. 3) As a result of the experiment, the proposed method showed superior performance in all algorithms compared to the baseline method. The precision rate of discrimination of enrolled students was measured to be up to 95.6% when using a Random Forest(RF), and the recall rate of dropout students was measured to be up to 80.0% when using Naive Bayes(NB). 4) Finally, based on the experimental results, a method for using a counseling recommender system to give priority to students who are likely to drop out was suggested. It was confirmed that reasonable decision-making can be conducted through convergence research that utilizes technologies in the IT field to solve the educational issues, and we plan to apply various artificial intelligence technologies through continuous research in the future.

A Study on automatic assignment of descriptors using machine learning (기계학습을 통한 디스크립터 자동부여에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.1 s.59
    • /
    • pp.279-299
    • /
    • 2006
  • This study utilizes various approaches of machine learning in the process of automatically assigning descriptors to journal articles. The effectiveness of feature selection and the size of training set were examined, after selecting core journals in the field of information science and organizing test collection from the articles of the past 11 years. Regarding feature selection, after reducing the feature set using $x^2$ statistics(CHI) and criteria that prefer high-frequency features(COS, GSS, JAC), the trained Support Vector Machines(SVM) performed the best. With respect to the size of the training set, it significantly influenced the performance of Support Vector Machines(SVM) and Voted Perceptron(VTP). However, it had little effect on Naive Bayes(NB).

Implementation of a bio-inspired two-mode structural health monitoring system

  • Lin, Tzu-Kang;Yu, Li-Chen;Ku, Chang-Hung;Chang, Kuo-Chun;Kiremidjian, Anne
    • Smart Structures and Systems
    • /
    • v.8 no.1
    • /
    • pp.119-137
    • /
    • 2011
  • A bio-inspired two-mode structural health monitoring (SHM) system based on the Na$\ddot{i}$ve Bayes (NB) classification method is discussed in this paper. To implement the molecular biology based Deoxyribonucleic acid (DNA) array concept in structural health monitoring, which has been demonstrated to be superior in disease detection, two types of array expression data have been proposed for the development of the SHM algorithm. For the micro-vibration mode, a two-tier auto-regression with exogenous (AR-ARX) process is used to extract the expression array from the recorded structural time history while an ARX process is applied for the analysis of the earthquake mode. The health condition of the structure is then determined using the NB classification method. In addition, the union concept in probability is used to improve the accuracy of the system. To verify the performance and reliability of the SHM algorithm, a downscaled eight-storey steel building located at the shaking table of the National Center for Research on Earthquake Engineering (NCREE) was used as the benchmark structure. The structural response from different damage levels and locations was collected and incorporated in the database to aid the structural health monitoring process. Preliminary verification has demonstrated that the structure health condition can be precisely detected by the proposed algorithm. To implement the developed SHM system in a practical application, a SHM prototype consisting of the input sensing module, the transmission module, and the SHM platform was developed. The vibration data were first measured by the deployed sensor, and subsequently the SHM mode corresponding to the desired excitation is chosen automatically to quickly evaluate the health condition of the structure. Test results from the ambient vibration and shaking table test showed that the condition and location of the benchmark structure damage can be successfully detected by the proposed SHM prototype system, and the information is instantaneously transmitted to a remote server to facilitate real-time monitoring. Implementing the bio-inspired two-mode SHM practically has been successfully demonstrated.

Comparative Application of Various Machine Learning Techniques for Lithology Predictions (다양한 기계학습 기법의 암상예측 적용성 비교 분석)

  • Jeong, Jina;Park, Eungyu
    • Journal of Soil and Groundwater Environment
    • /
    • v.21 no.3
    • /
    • pp.21-34
    • /
    • 2016
  • In the present study, we applied various machine learning techniques comparatively for prediction of subsurface structures based on multiple secondary information (i.e., well-logging data). The machine learning techniques employed in this study are Naive Bayes classification (NB), artificial neural network (ANN), support vector machine (SVM) and logistic regression classification (LR). As an alternative model, conventional hidden Markov model (HMM) and modified hidden Markov model (mHMM) are used where additional information of transition probability between primary properties is incorporated in the predictions. In the comparisons, 16 boreholes consisted with four different materials are synthesized, which show directional non-stationarity in upward and downward directions. Futhermore, two types of the secondary information that is statistically related to each material are generated. From the comparative analysis with various case studies, the accuracies of the techniques become degenerated with inclusion of additive errors and small amount of the training data. For HMM predictions, the conventional HMM shows the similar accuracies with the models that does not relies on transition probability. However, the mHMM consistently shows the highest prediction accuracy among the test cases, which can be attributed to the consideration of geological nature in the training of the model.

A Halal Food Classification Framework Using Machine Learning Method for Enhancing Muslim Tourists (무슬림 관광객 증대를 위한 머신러닝 기반의 할랄푸드 분류 프레임워크)

  • Kim, Sun-A;Kim, Jeong-Won;Won, Dong-Yeon;Choi, Yerim
    • The Journal of Information Systems
    • /
    • v.26 no.3
    • /
    • pp.273-293
    • /
    • 2017
  • Purpose The purpose of this study is to introduce a framework that helps Muslims to determine whether a food can be consumed. It can complement existing Halal food classification services having a difficulty of constructing Halal food database. Design/methodology/approach The proposed framework includes two components. First, OCR(Optical Character Recognition) technique is utilized to read the food additive information. Second, machine learning methods were used to trained and predicted to determine whether a food can be consumed using the provided information. Findings Among the compared machine learning methods, SVM(Support Vector Machine), DT(Decision Tree), and NB(Naive Bayes), SVM with linear kernel and DT had excellent performance in the Halal food classification. The framework which adopting the proposed framework will enhance the tourism experiences of Muslim tourists who consider keeping the Islamic law most importantly. Furthermore, it can eventually contribute to the enhancement of smart tourism ecosystem.

Hyperparameter Tuning Based Machine Learning classifier for Breast Cancer Prediction

  • Md. Mijanur Rahman;Asikur Rahman Raju;Sumiea Akter Pinky;Swarnali Akter
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.196-202
    • /
    • 2024
  • Currently, the second most devastating form of cancer in people, particularly in women, is Breast Cancer (BC). In the healthcare industry, Machine Learning (ML) is commonly employed in fatal disease prediction. Due to breast cancer's favorable prognosis at an early stage, a model is created to utilize the Dataset on Wisconsin Diagnostic Breast Cancer (WDBC). Conversely, this model's overarching axiom is to compare the effectiveness of five well-known ML classifiers, including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), K-Nearest Neighbor (KNN), and Naive Bayes (NB) with the conventional method. To counterbalance the effect with conventional methods, the overarching tactic we utilized was hyperparameter tuning utilizing the grid search method, which improved accuracy, secondary precision, third recall, and finally the F1 score. In this study hyperparameter tuning model, the rate of accuracy increased from 94.15% to 98.83% whereas the accuracy of the conventional method increased from 93.56% to 97.08%. According to this investigation, KNN outperformed all other classifiers in terms of accuracy, achieving a score of 98.83%. In conclusion, our study shows that KNN works well with the hyper-tuning method. These analyses show that this study prediction approach is useful in prognosticating women with breast cancer with a viable performance and more accurate findings when compared to the conventional approach.