• Title/Summary/Keyword: K-Nearest Neighbors

Search Result 204, Processing Time 0.141 seconds

Explainable analysis of the Relationship between Hypertension with Gas leakages (설명 가능한 인공지능 기술을 활용한 가스누출과 고혈압의 연관 분석)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.55-56
    • /
    • 2022
  • Hypertension is a severe health problem and increases the risk of other health issues, such as heart disease, heart attack, and stroke. In this research, we propose a machine learning-based prediction method for the risk of chronic hypertension. The proposed method consists of four main modules. In the first module, the linear interpolation method fills missing values of the integration of gas and meteorological datasets. In the second module, the OrdinalEncoder-based normalization is followed by the Decision tree algorithm to select important features. The prediction analysis module builds three models based on k-Nearest Neighbors, Decision Tree, and Random Forest to predict hypertension levels. Finally, the features used in the prediction model are explained by the DeepSHAP approach. The proposed method is evaluated by integrating the Korean meteorological agency dataset, natural gas leakage dataset, and Korean National Health and Nutrition Examination Survey dataset. The experimental results showed important global features for the hypertension of the entire population and local components for particular patients. Based on the local explanation results for a randomly selected 65-year-old male, the effect of hypertension increased from 0.694 to 1.249 when age increased by 0.37 and gas loss increased by 0.17. Therefore, it is concluded that gas loss is the cause of high blood pressure.

Assessment of wall convergence for tunnels using machine learning techniques

  • Mahmoodzadeh, Arsalan;Nejati, Hamid Reza;Mohammadi, Mokhtar;Ibrahim, Hawkar Hashim;Mohammed, Adil Hussein;Rashidi, Shima
    • Geomechanics and Engineering
    • /
    • v.31 no.3
    • /
    • pp.265-279
    • /
    • 2022
  • Tunnel convergence prediction is essential for the safe construction and design of tunnels. This study proposes five machine learning models of deep neural network (DNN), K-nearest neighbors (KNN), Gaussian process regression (GPR), support vector regression (SVR), and decision trees (DT) to predict the convergence phenomenon during or shortly after the excavation of tunnels. In this respect, a database including 650 datasets (440 for training, 110 for validation, and 100 for test) was gathered from the previously constructed tunnels. In the database, 12 effective parameters on the tunnel convergence and a target of tunnel wall convergence were considered. Both 5-fold and hold-out cross validation methods were used to analyze the predicted outcomes in the ML models. Finally, the DNN method was proposed as the most robust model. Also, to assess each parameter's contribution to the prediction problem, the backward selection method was used. The results showed that the highest and lowest impact parameters for tunnel convergence are tunnel depth and tunnel width, respectively.

Comparative Evaluation of Machine Learning Models for Predicting Soccer Injury Types

  • Davronbek Malikov;Jaeho Kim;Jung Kyu Park
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.27 no.2_1
    • /
    • pp.257-268
    • /
    • 2024
  • Soccer is type of sport that carries a high risk of injury. Injury is not only cause in the unlucky soccer carrier and also team performance as well as financial effects can be worse since soccer is a team-based game. The duration of recovery from a soccer injury typically relies on its type and severity. Therefore, we conduct this research in order to predict the probability of players injury type using machine learning technologies in this paper. Furthermore, we compare different machine learning models to find the best fit model. This paper utilizes various supervised classification machine learning models, including Decision Tree, Random Forest, K-Nearest Neighbors (KNN), and Naive Bayes. Moreover, based on our finding the KNN and Decision models achieved the highest accuracy rates at 70%, surpassing other models. The Random Forest model followed closely with an accuracy score of 62%. Among the evaluated models, the Naive Bayes model demonstrated the lowest accuracy at 56%. We gathered information about 54 professional soccer players who are playing in the top five European leagues based on their career history. We gathered information about 54 professional soccer players who are playing in the top five European leagues based on their career history.

Analysis of disc cutter replacement based on wear patterns using artificial intelligence classification models

  • Yunhee Kim;Jaewoo Shin;Bumjoo Kim
    • Geomechanics and Engineering
    • /
    • v.38 no.6
    • /
    • pp.633-645
    • /
    • 2024
  • Disc cutters, used as excavation tools for rocks in a Tunnel Boring Machine (TBM), naturally undergo wear during the tunneling process, involving crushing and cutting through the ground, leading to various wear types. When disc cutters reach their wear limits, they must be replaced at the appropriate time to ensure efficient excavation. General disc cutter life prediction models are typically used during the design phase to predict the total required quantity and replacement locations for construction. However, disc cutters are replaced more frequently during tunneling than initially planned. Unpredictable disc cutter replacements can easily diminish tunneling efficiency, and abnormal wear is a common cause during tunneling in complex ground conditions. This study aims to overcome the limitations of existing disc cutter life prediction models by utilizing machine data generated during tunneling to predict disc cutter wear patterns and determine the need for replacements in real-time. Artificial intelligence classification algorithms, including K-nearest Neighbors (KNN), Support Vector Machine (SVM), Decision Tree (DT), and Stacking, are employed to assess the need for disc cutter replacement. Binary classification models are developed to predict which disc cutters require replacement, while multi-class classification models are fine-tuned to identify three categories: no replacement required, replacement due to normal wear, and replacement due to abnormal wear during tunneling. The performance of these models is thoroughly assessed, demonstrating that the proposed approach effectively manages disc cutter wear and replacements in shield TBM tunnel projects.

Machine-learning Approaches with Multi-temporal Remotely Sensed Data for Estimation of Forest Biomass and Forest Reference Emission Levels (시계열 위성영상과 머신러닝 기법을 이용한 산림 바이오매스 및 배출기준선 추정)

  • Yong-Kyu, Lee;Jung-Soo, Lee
    • Journal of Korean Society of Forest Science
    • /
    • v.111 no.4
    • /
    • pp.603-612
    • /
    • 2022
  • The study aims were to evaluate a machine-learning, algorithm-based, forest biomass-estimation model to estimate subnational forest biomass and to comparatively analyze REDD+ forest reference emission levels. Time-series Landsat satellite imagery and ESA Biomass Climate Change Initiative information were used to build a machine-learning-based biomass estimation model. The k-nearest neighbors algorithm (kNN), which is a non-parametric learning model, and the tree-based random forest (RF) model were applied to the machine-learning algorithm, and the estimated biomasses were compared with the forest reference emission levels (FREL) data, which was provided by the Paraguayan government. The root mean square error (RMSE), which was the optimum parameter of the kNN model, was 35.9, and the RMSE of the RF model was lower at 34.41, showing that the RF model was superior. As a result of separately using the FREL, kNN, and RF methods to set the reference emission levels, the gradient was set to approximately -33,000 tons, -253,000 tons, and -92,000 tons, respectively. These results showed that the machine learning-based estimation model was more suitable than the existing methods for setting reference emission levels.

Improving minority prediction performance of support vector machine for imbalanced text data via feature selection and SMOTE (단어선택과 SMOTE 알고리즘을 이용한 불균형 텍스트 데이터의 소수 범주 예측성능 향상 기법)

  • Jongchan Kim;Seong Jun Chang;Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.4
    • /
    • pp.395-410
    • /
    • 2024
  • Text data is usually made up of a wide variety of unique words. Even in standard text data, it is common to find tens of thousands of different words. In text data analysis, usually, each unique word is treated as a variable. Thus, text data can be regarded as a dataset with a large number of variables. On the other hand, in text data classification, we often encounter class label imbalance problems. In the cases of substantial imbalances, the performance of conventional classification models can be severely degraded. To improve the classification performance of support vector machines (SVM) for imbalanced data, algorithms such as the Synthetic Minority Over-sampling Technique (SMOTE) can be used. The SMOTE algorithm synthetically generates new observations for the minority class based on the k-Nearest Neighbors (kNN) algorithm. However, in datasets with a large number of variables, such as text data, errors may accumulate. This can potentially impact the performance of the kNN algorithm. In this study, we propose a method for enhancing prediction performance for the minority class of imbalanced text data. Our approach involves employing variable selection to generate new synthetic observations in a reduced space, thereby improving the overall classification performance of SVM.

CNN-based Adaptive K for Improving Positioning Accuracy in W-kNN-based LTE Fingerprint Positioning

  • Kwon, Jae Uk;Chae, Myeong Seok;Cho, Seong Yun
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.11 no.3
    • /
    • pp.217-227
    • /
    • 2022
  • In order to provide a location-based services regardless of indoor or outdoor space, it is important to provide position information of the terminal regardless of location. Among the wireless/mobile communication resources used for this purpose, Long Term Evolution (LTE) signal is a representative infrastructure that can overcome spatial limitations, but the positioning method based on the location of the base station has a disadvantage in that the accuracy is low. Therefore, a fingerprinting technique, which is a pattern recognition technology, has been widely used. The simplest yet widely applied algorithm among Fingerprint positioning technologies is k-Nearest Neighbors (kNN). However, in the kNN algorithm, it is difficult to find the optimal K value with the lowest positioning error for each location to be estimated, so it is generally fixed to an appropriate K value and used. Since the optimal K value cannot be applied to each estimated location, therefore, there is a problem in that the accuracy of the overall estimated location information is lowered. Considering this problem, this paper proposes a technique for adaptively varying the K value by using a Convolutional Neural Network (CNN) model among Artificial Neural Network (ANN) techniques. First, by using the signal information of the measured values obtained in the service area, an image is created according to the Physical Cell Identity (PCI) and Band combination, and an answer label for supervised learning is created. Then, the structure of the CNN is modeled to classify K values through the image information of the measurements. The performance of the proposed technique is verified based on actual data measured in the testbed. As a result, it can be seen that the proposed technique improves the positioning performance compared to using a fixed K value.

A Study on Application of Machine Learning Algorithms to Visitor Marketing in Sports Stadium (기계학습 알고리즘을 사용한 스포츠 경기장 방문객 마케팅 적용 방안)

  • Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.27-33
    • /
    • 2018
  • In this study, we analyze the big data of visitors who are looking for a sports stadium in marketing field and conduct research to provide customized marketing service to consumers. For this purpose, we intend to derive a similar visitor group by using the K-means clustering method. Also, we will use the K-nearest neighbors method to predict the store of interest for new visitors. As a result of the experiment, it was possible to provide a marketing service suitable for each group attribute by deriving a group of similar visitors through the above two algorithms, and it was possible to recommend products and events for new visitors.

A study on the imputation solution for missing speed data on UTIS by using adaptive k-NN algorithm (적응형 k-NN 기법을 이용한 UTIS 속도정보 결측값 보정처리에 관한 연구)

  • Kim, Eun-Jeong;Bae, Gwang-Soo;Ahn, Gye-Hyeong;Ki, Yong-Kul;Ahn, Yong-Ju
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.13 no.3
    • /
    • pp.66-77
    • /
    • 2014
  • UTIS(Urban Traffic Information System) directly collects link travel time in urban area by using probe vehicles. Therefore it can estimate more accurate link travel speed compared to other traffic detection systems. However, UTIS includes some missing data caused by the lack of probe vehicles and RSEs on road network, system failures, and other factors. In this study, we suggest a new model, based on k-NN algorithm, for imputing missing data to provide more accurate travel time information. New imputation model is an adaptive k-NN which can flexibly adjust the number of nearest neighbors(NN) depending on the distribution of candidate objects. The evaluation result indicates that the new model successfully imputed missing speed data and significantly reduced the imputation error as compared with other models(ARIMA and etc). We have a plan to use the new imputation model improving traffic information service by applying UTIS Central Traffic Information Center.

A Study on Book Categorization in Social Sciences Using kNN Classifiers and Table of Contents Text (목차 정보와 kNN 분류기를 이용한 사회과학 분야 도서 자동 분류에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean Society for information Management
    • /
    • v.37 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • This study applied automatic classification using table of contents (TOC) text for 6,253 social science books from a newly arrived list collected by a university library. The k-nearest neighbors (kNN) algorithm was used as a classifier, and the ten divisions on the second level of the DDC's main class 300 given to books by the library were used as classes (labels). The features used in this study were keywords extracted from titles and TOCs of the books. The TOCs were obtained through the OpenAPI from an Internet bookstore. As a result, it was found that the TOC features were good for improving both classification recall and precision. The TOC was shown to reduce the overfitting problem of imbalanced data with its rich features. Law and education have high topic specificity in the field of social sciences, so the only title features can bring good classification performance in these fields.