Search | Korea Science

Naive Bayes classifiers boosted by sufficient dimension reduction: applications to top-k classification

Yang, Su Hyeong;Shin, Seung Jun;Sung, Wooseok;Lee, Choon Won
- Communications for Statistical Applications and Methods
- /
- v.29 no.5
- /
- pp.603-614
- /
- 2022
The naive Bayes classifier is one of the most straightforward classification tools and directly estimates the class probability. However, because it relies on the independent assumption of the predictor, which is rarely satisfied in real-world problems, its application is limited in practice. In this article, we propose employing sufficient dimension reduction (SDR) to substantially improve the performance of the naive Bayes classifier, which is often deteriorated when the number of predictors is not restrictively small. This is not surprising as SDR reduces the predictor dimension without sacrificing classification information, and predictors in the reduced space are constructed to be uncorrelated. Therefore, SDR leads the naive Bayes to no longer be naive. We applied the proposed naive Bayes classifier after SDR to build a recommendation system for the eyewear-frames based on customers' face shape, demonstrating its utility in the top-k classification problem.
https://doi.org/10.29220/CSAM.2022.29.5.603 인용 PDF KSCI

Practical method to improve usage efficiency of bike-sharing systems

Lee, Chun-Hee;Lee, Jeong-Woo;Jung, YungJoon
- ETRI Journal
- /
- v.44 no.2
- /
- pp.244-259
- /
- 2022
Bicycle- or bike-sharing systems (BSSs) have received increasing attention as a secondary transportation mode due to their advantages, for example, accessibility, prevention of air pollution, and health promotion. However, in BSSs, due to bias in bike demands, the bike rebalancing problem should be solved. Various methods have been proposed to solve this problem; however, it is difficult to apply such methods to small cities because bike demand is sparse, and there are many practical issues to solve. Thus, we propose a demand prediction model using multiple classifiers, time grouping, categorization, weather analysis, and station correlation information. In addition, we analyze real-world relocation data by relocation managers and propose a relocation algorithm based on the analytical results to solve the bike rebalancing problem. The proposed system is compared experimentally with the results obtained by the real relocation managers.
https://doi.org/10.4218/etrij.2021-0408 인용 PDF KSCI

Neural Networks-Based Method for Electrocardiogram Classification

Maksym Kovalchuk;Viktoriia Kharchenko;Andrii Yavorskyi;Igor Bieda;Taras Panchenko
- International Journal of Computer Science & Network Security
- /
- v.23 no.9
- /
- pp.186-191
- /
- 2023
Neural Networks are widely used for huge variety of tasks solution. Machine Learning methods are used also for signal and time series analysis, including electrocardiograms. Contemporary wearable devices, both medical and non-medical type like smart watch, allow to gather the data in real time uninterruptedly. This allows us to transfer these data for analysis or make an analysis on the device, and thus provide preliminary diagnosis, or at least fix some serious deviations. Different methods are being used for this kind of analysis, ranging from medical-oriented using distinctive features of the signal to machine learning and deep learning approaches. Here we will demonstrate a neural network-based approach to this task by building an ensemble of 1D CNN classifiers and a final classifier of selection using logistic regression, random forest or support vector machine, and make the conclusions of the comparison with other approaches.
https://doi.org/10.22937/IJCSNS.2023.23.9.24 인용 PDF

A Method for Generating and Combining Classifiers for Large Scale Data (대용량 문서학습을 위한 분류기 생성 및 결합방법)

Jeong, Do-Heon;Hwang, Myung-Gwon;Sung, Won-Kyung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2011.04a
- /
- pp.1551-1554
- /
- 2011
대용량 데이터 환경에의 적용이 가능한 대용량 학습기반의 자동범주화 기법과 범용적으로 사용할 수 있는 기법은 대량의 정보를 처리해야하는 정보분석 및 정보서비스 환경에 가장 필요한 기술요소라 할 수 있다. 본 논문에서는 대용량의 문서를 단위 컴포넌트로 분할하여 학습하고 이를 동적으로 결합하는 대용량 분류기 생성 기법을 소개하고 자동범주화 성능을 SVM 모델과 비교하여 봄으로써, 본 기술의 활용 가능성을 살펴보도록 한다.
https://doi.org/10.3745/PKIPS.y2011m04a.1551 인용 PDF

The use of support vector machines in semi-supervised classification

Bae, Hyunjoo;Kim, Hyungwoo;Shin, Seung Jun
- Communications for Statistical Applications and Methods
- /
- v.29 no.2
- /
- pp.193-202
- /
- 2022
Semi-supervised learning has gained significant attention in recent applications. In this article, we provide a selective overview of popular semi-supervised methods and then propose a simple but effective algorithm for semi-supervised classification using support vector machines (SVM), one of the most popular binary classifiers in a machine learning community. The idea is simple as follows. First, we apply the dimension reduction to the unlabeled observations and cluster them to assign labels on the reduced space. SVM is then employed to the combined set of labeled and unlabeled observations to construct a classification rule. The use of SVM enables us to extend it to the nonlinear counterpart via kernel trick. Our numerical experiments under various scenarios demonstrate that the proposed method is promising in semi-supervised classification.
https://doi.org/10.29220/CSAM.2022.29.2.193 인용 PDF KSCI

Performance Analysis of Explainers for Sentiment Classifiers of Movie Reviews (영화평 감성 분석기를 대상으로 한 설명자의 성능 분석)

Park, Cheon-Young;Lee, Kong Joo
- Annual Conference on Human and Language Technology
- /
- 2020.10a
- /
- pp.563-568
- /
- 2020
본 연구에서는 블랙박스로 알려진 딥러닝 모델에 설명 근거를 제공할 수 있는 설명자 모델을 적용해 보았다. 영화평 감성 분석을 위해 MLP, CNN으로 구성된 딥러닝 모델과 결정트리의 앙상블인 Gradient Boosting 모델을 이용하여 감성 분류기를 구축하였다. 설명자 모델로는 기울기(gradient)을 기반으로 하는 IG와 레이어 사이의 가중치(weight)을 기반으로 하는 CAM, 그리고 설명가능한 대리 모델을 이용하는 LIME과 입력 속성에 대한 선형모델을 추정하는 SHAP을 사용하였다. 설명자 모델의 특성을 보기 위하여 히트맵과 관련성 높은 N개의 속성을 추출해 보았다. 설명자가 제공하는 기여도에 따라 입력 속성을 제거해 가며 분류기 성능 변화를 측정하는 정량적 평가도 수행하였다. 또한, 사람의 판단 근거와의 일치도를 살펴볼 수 있는 '설명 근거 정확도'라는 새로운 평가 방법을 제안하여 적용해 보았다.
PDF

Motion classification using distributional features of 3D skeleton data

Woohyun Kim;Daeun Kim;Kyoung Shin Park;Sungim Lee
- Communications for Statistical Applications and Methods
- /
- v.30 no.6
- /
- pp.551-560
- /
- 2023
Recently, there has been significant research into the recognition of human activities using three-dimensional sequential skeleton data captured by the Kinect depth sensor. Many of these studies employ deep learning models. This study introduces a novel feature selection method for this data and analyzes it using machine learning models. Due to the high-dimensional nature of the original Kinect data, effective feature extraction methods are required to address the classification challenge. In this research, we propose using the first four moments as predictors to represent the distribution of joint sequences and evaluate their effectiveness using two datasets: The exergame dataset, consisting of three activities, and the MSR daily activity dataset, composed of ten activities. The results show that the accuracy of our approach outperforms existing methods on average across different classifiers.
https://doi.org/10.29220/CSAM.2023.30.6.551 인용 PDF

Diagnosing Reading Disorders based on Eye Movements during Natural Reading

Yongseok Yoo
- Journal of information and communication convergence engineering
- /
- v.21 no.4
- /
- pp.281-286
- /
- 2023
Diagnosing reading disorders involves complex procedures to evaluate complex cognitive processes. For an accurate diagnosis, a series of tests and evaluations by human experts are required. In this study, we propose a quantitative tool to diagnose reading disorders based on natural reading behaviors using minimal human input. The eye movements of the third- and fourth-grade students were recorded while they read a text at their own pace. Seven machine learning models were used to evaluate the gaze patterns of the words in the presented text and classify the students as normal or having a reading disorder. The accuracy of the machine learning-based diagnosis was measured using the diagnosis by human experts as the ground truth. The highest accuracy of 0.8 was achieved by the support vector machine and random forest classifiers. This result demonstrated that machine learning-based automated diagnosis could substitute for the traditional diagnosis of reading disorders and enable large-scale screening for students at an early age.
https://doi.org/10.56977/jicce.2023.21.4.281 인용 PDF

Classification of COVID-19 Disease: A Machine Learning Perspective

Kinza Sardar
- International Journal of Computer Science & Network Security
- /
- v.24 no.3
- /
- pp.107-112
- /
- 2024
Nowadays the deadly virus famous as COVID-19 spread all over the world starts from the Wuhan China in 2019. This disease COVID-19 Virus effect millions of people in very short time. There are so many symptoms of COVID19 perhaps the Identification of a person infected with COVID-19 virus is really a difficult task. Moreover it's a challenging task to identify whether a person or individual have covid test positive or negative. We are developing a framework in which we used machine learning techniques..The proposed method uses DecisionTree, KNearestNeighbors, GaussianNB, LogisticRegression, BernoulliNB , RandomForest , Machine Learning methods as the classifier for diagnosis of covid ,however, 5-fold and 10-fold cross-validations were applied through the classification process. The experimental results showed that the best accuracy obtained from Decision Tree classifiers. The data preprocessing techniques have been applied for improving the classification performance. Recall, accuracy, precision, and F-score metrics were used to evaluate the classification performance. In future we will improve model accuracy more than we achieved now that is 93 percent by applying different techniques
https://doi.org/10.22937/IJCSNS.2024.24.3.13 인용 PDF

Term Frequency-Inverse Document Frequency (TF-IDF) Technique Using Principal Component Analysis (PCA) with Naive Bayes Classification

J.Uma;K.Prabha
- International Journal of Computer Science & Network Security
- /
- v.24 no.4
- /
- pp.113-118
- /
- 2024
Pursuance Sentiment Analysis on Twitter is difficult then performance it's used for great review. The present be for the reason to the tweet is extremely small with mostly contain slang, emoticon, and hash tag with other tweet words. A feature extraction stands every technique concerning structure and aspect point beginning particular tweets. The subdivision in a aspect vector is an integer that has a commitment on ascribing a supposition class to a tweet. The cycle of feature extraction is to eradicate the exact quality to get better the accurateness of the classifications models. In this manuscript we proposed Term Frequency-Inverse Document Frequency (TF-IDF) method is to secure Principal Component Analysis (PCA) with Naïve Bayes Classifiers. As the classifications process, the work proposed can produce different aspects from wildly valued feature commencing a Twitter dataset.
https://doi.org/10.22937/IJCSNS.2024.24.4.12 인용 PDF

Search Result 743, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)