• 제목/요약/키워드: Machine Learning Techniques

검색결과 1,117건 처리시간 0.023초

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • 제2권2호
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

Optimizing Artificial Neural Network-Based Models to Predict Rice Blast Epidemics in Korea

  • Lee, Kyung-Tae;Han, Juhyeong;Kim, Kwang-Hyung
    • The Plant Pathology Journal
    • /
    • 제38권4호
    • /
    • pp.395-402
    • /
    • 2022
  • To predict rice blast, many machine learning methods have been proposed. As the quality and quantity of input data are essential for machine learning techniques, this study develops three artificial neural network (ANN)-based rice blast prediction models by combining two ANN models, the feed-forward neural network (FFNN) and long short-term memory, with diverse input datasets, and compares their performance. The Blast_Weathe long short-term memory r_FFNN model had the highest recall score (66.3%) for rice blast prediction. This model requires two types of input data: blast occurrence data for the last 3 years and weather data (daily maximum temperature, relative humidity, and precipitation) between January and July of the prediction year. This study showed that the performance of an ANN-based disease prediction model was improved by applying suitable machine learning techniques together with the optimization of hyperparameter tuning involving input data. Moreover, we highlight the importance of the systematic collection of long-term disease data.

Shield TBM disc cutter replacement and wear rate prediction using machine learning techniques

  • Kim, Yunhee;Hong, Jiyeon;Shin, Jaewoo;Kim, Bumjoo
    • Geomechanics and Engineering
    • /
    • 제29권3호
    • /
    • pp.249-258
    • /
    • 2022
  • A disc cutter is an excavation tool on a tunnel boring machine (TBM) cutterhead; it crushes and cuts rock mass while the machine excavates using the cutterhead's rotational movement. Disc cutter wear occurs naturally. Thus, along with the management of downtime and excavation efficiency, abrasioned disc cutters need to be replaced at the proper time; otherwise, the construction period could be delayed and the cost could increase. The most common prediction models for TBM performance and for the disc cutter lifetime have been proposed by the Colorado School of Mines and Norwegian University of Science and Technology. However, design parameters of existing models do not well correspond to the field values when a TBM encounters complex and difficult ground conditions in the field. Thus, this study proposes a series of machine learning models to predict the disc cutter lifetime of a shield TBM using the excavation (machine) data during operation which is response to the rock mass. This study utilizes five different machine learning techniques: four types of classification models (i.e., K-Nearest Neighbors (KNN), Support Vector Machine, Decision Tree, and Staking Ensemble Model) and one artificial neural network (ANN) model. The KNN model was found to be the best model among the four classification models, affording the highest recall of 81%. The ANN model also predicted the wear rate of disc cutters reasonably well.

Stress Identification and Analysis using Observed Heart Beat Data from Smart HRM Sensor Device

  • Pramanta, SPL Aditya;Kim, Myonghee;Park, Man-Gon
    • 한국멀티미디어학회논문지
    • /
    • 제20권8호
    • /
    • pp.1395-1405
    • /
    • 2017
  • In this paper, we analyses heart beat data to identify subjects stress state (binary) using heart rate variability (HRV) features extracted from heart beat data of the subjects and implement supervised machine learning techniques to create the mental stress classifier. There are four steps need to be done: data acquisition, data processing (HRV analysis), features selection, and machine learning, before doing performance measurement. There are 56 features generated from the HRV Analysis module with several of them are selected (using own algorithm) after computing the Pearson Correlation Matrix (p-values). The results of the list of selected features compared with all features data are compared by its model error after training using several machine learning techniques: support vector machine, decision tree, and discriminant analysis. SVM model and decision tree model with using selected features shows close results compared to using all recording by only 1% difference. Meanwhile, the discriminant analysis differs about 5%. All the machine learning method used in this works have 90% maximum average accuracy.

IMU 원신호 기반의 기계학습을 통한 충격전 낙상방향 분류 (Classification of Fall Direction Before Impact Using Machine Learning Based on IMU Raw Signals)

  • 이현빈;이창준;이정근
    • 센서학회지
    • /
    • 제31권2호
    • /
    • pp.96-101
    • /
    • 2022
  • As the elderly population gradually increases, the risk of fatal fall accidents among the elderly is increasing. One way to cope with a fall accident is to determine the fall direction before impact using a wearable inertial measurement unit (IMU). In this context, a previous study proposed a method of classifying fall directions using a support vector machine with sensor velocity, acceleration, and tilt angle as input parameters. However, in this method, the IMU signals are processed through several processes, including a Kalman filter and the integration of acceleration, which involves a large amount of computation and error factors. Therefore, this paper proposes a machine learning-based method that classifies the fall direction before impact using IMU raw signals rather than processed data. In this study, we investigated the effects of the following two factors on the classification performance: (1) the usage of processed/raw signals and (2) the selection of machine learning techniques. First, as a result of comparing the processed/raw signals, the difference in sensitivities between the two methods was within 5%, indicating an equivalent level of classification performance. Second, as a result of comparing six machine learning techniques, K-nearest neighbor and naive Bayes exhibited excellent performance with a sensitivity of 86.0% and 84.1%, respectively.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • 제37권6호
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Underwater Acoustic Research Trends with Machine Learning: Active SONAR Applications

  • Yang, Haesang;Byun, Sung-Hoon;Lee, Keunhwa;Choo, Youngmin;Kim, Kookhyun
    • 한국해양공학회지
    • /
    • 제34권4호
    • /
    • pp.277-284
    • /
    • 2020
  • Underwater acoustics, which is the study of phenomena related to sound waves in water, has been applied mainly in research on the use of sound navigation and range (SONAR) systems for communication, target detection, investigation of marine resources and environments, and noise measurement and analysis. The main objective of underwater acoustic remote sensing is to obtain information on a target object indirectly by using acoustic data. Presently, various types of machine learning techniques are being widely used to extract information from acoustic data. The machine learning techniques typically used in underwater acoustics and their applications in passive SONAR systems were reviewed in the first two parts of this work (Yang et al., 2020a; Yang et al., 2020b). As a follow-up, this paper reviews machine learning applications in SONAR signal processing with a focus on active target detection and classification.

기계학습을 이용한 염화물 확산계수 예측모델 개발 (Development of Prediction Model of Chloride Diffusion Coefficient using Machine Learning)

  • 김현수
    • 한국공간구조학회논문집
    • /
    • 제23권3호
    • /
    • pp.87-94
    • /
    • 2023
  • Chloride is one of the most common threats to reinforced concrete (RC) durability. Alkaline environment of concrete makes a passive layer on the surface of reinforcement bars that prevents the bar from corrosion. However, when the chloride concentration amount at the reinforcement bar reaches a certain level, deterioration of the passive protection layer occurs, causing corrosion and ultimately reducing the structure's safety and durability. Therefore, understanding the chloride diffusion and its prediction are important to evaluate the safety and durability of RC structure. In this study, the chloride diffusion coefficient is predicted by machine learning techniques. Various machine learning techniques such as multiple linear regression, decision tree, random forest, support vector machine, artificial neural networks, extreme gradient boosting annd k-nearest neighbor were used and accuracy of there models were compared. In order to evaluate the accuracy, root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE) and coefficient of determination (R2) were used as prediction performance indices. The k-fold cross-validation procedure was used to estimate the performance of machine learning models when making predictions on data not used during training. Grid search was applied to hyperparameter optimization. It has been shown from numerical simulation that ensemble learning methods such as random forest and extreme gradient boosting successfully predicted the chloride diffusion coefficient and artificial neural networks also provided accurate result.

Wine Quality Prediction by Using Backward Elimination Based on XGBoosting Algorithm

  • Umer Zukaib;Mir Hassan;Tariq Khan;Shoaib Ali
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.31-42
    • /
    • 2024
  • Different industries mostly rely on quality certification for promoting their products or brands. Although getting quality certification, specifically by human experts is a tough job to do. But the field of machine learning play a vital role in every aspect of life, if we talk about quality certification, machine learning is having a lot of applications concerning, assigning and assessing quality certifications to different products on a macro level. Like other brands, wine is also having different brands. In order to ensure the quality of wine, machine learning plays an important role. In this research, we use two datasets that are publicly available on the "UC Irvine machine learning repository", for predicting the wine quality. Datasets that we have opted for our experimental research study were comprised of white wine and red wine datasets, there are 1599 records for red wine and 4898 records for white wine datasets. The research study was twofold. First, we have used a technique called backward elimination in order to find out the dependency of the dependent variable on the independent variable and predict the dependent variable, the technique is useful for predicting which independent variable has maximum probability for improving the wine quality. Second, we used a robust machine learning algorithm known as "XGBoost" for efficient prediction of wine quality. We evaluate our model on the basis of error measures, root mean square error, mean absolute error, R2 error and mean square error. We have compared the results generated by "XGBoost" with the other state-of-the-art machine learning techniques, experimental results have showed, "XGBoost" outperform as compared to other state of the art machine learning techniques.

머신러닝을 이용한 정부통계지표가 소매업 매출액에 미치는 예측 변인 탐색: 약국을 중심으로 (Exploring the Predictive Variables of Government Statistical Indicators on Retail sales Using Machine Learning: Focusing on Pharmacy)

  • 이광수
    • 인터넷정보학회논문지
    • /
    • 제23권3호
    • /
    • pp.125-135
    • /
    • 2022
  • 본 연구는 데이터, 네트워크, 인공지능을 기반으로 산업 생태계 조성을 위해 구축된 정부통계지표가 약국 매출액에 영향을 미치는지 머신러닝을 이용하여 변인을 탐색하고 약국 매출액 예측에 적합한 분석 기법을 제공하고자 한다. 이에, 본 연구는 28개 정부통계지표와 소매업종인 약국을 대상으로 2016년 1월부터 2021년 12월까지의 분석 데이터를 활용하여 머신러닝 기법인 랜덤 포레스트, XGBoost, LightGBM, CatBoost을 통해 예측 변인 및 성능을 탐색하였다. 분석결과 경기관련 지표인 경제심리지수, 경기동행지수순환변동치, 소비자심리지수는 약국 매출액에 영향을 미치는 중요한 변인으로 나타났고, 회귀성능은 지표 MAE, MSE, RMSE를 살펴본 결과 랜덤 포레스트가 XGBoost, LightGBM, CatBoost 보다 성능이 가장 우수하게 나타났다. 이에, 본 연구는 머신러닝 결과를 토대로 약국 매출액에 영향을 미치는 변인과 최적의 머신러닝 기법을 제시하였으며, 여러 시사점과 후속연구를 제안하였다.