• Title/Summary/Keyword: Scikit Learn library

Search Result 7, Processing Time 0.016 seconds

Course recommendation system using deep learning (딥러닝을 이용한 강좌 추천시스템)

  • Min-Ah Lim;Seung-Yeon Hwang;Dong-Jin Shin;Jae-Kon Oh;Jeong-Joon Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.193-198
    • /
    • 2023
  • We study a learner-customized lecture recommendation project using deep learning. Recommendation systems can be easily found on the web and apps, and examples using this feature include recommending feature videos by clicking users and advertising items in areas of interest to users on SNS. In this study, the sentence similarity Word2Vec was mainly used to filter twice, and the course was recommended through the Surprise library. With this system, it provides users with the desired classification of course data conveniently and conveniently. Surprise Library is a Python scikit-learn-based library that is conveniently used in recommendation systems. By analyzing the data, the system is implemented at a high speed, and deeper learning is used to implement more precise results through course steps. When a user enters a keyword of interest, similarity between the keyword and the course title is executed, and similarity with the extracted video data and voice text is executed, and the highest ranking video data is recommended through the Surprise Library.

A Study on Applicability of Machine Learning for Book Classification of Public Libraries: Focusing on Social Science and Arts (공공도서관 도서 분류를 위한 머신러닝 적용 가능성 연구 - 사회과학과 예술분야를 중심으로 -)

  • Kwak, Chul Wan
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.1
    • /
    • pp.133-150
    • /
    • 2021
  • The purpose of this study is to identify the applicability of machine learning targeting titles in the classification of books in public libraries. Data analysis was performed using Python's scikit-learn library through the Jupiter notebook of the Anaconda platform. KoNLPy analyzer and Okt class were used for Hangul morpheme analysis. The units of analysis were 2,000 title fields and KDC classification class numbers (300 and 600) extracted from the KORMARC records of public libraries. As a result of analyzing the data using six machine learning models, it showed a possibility of applying machine learning to book classification. Among the models used, the neural network model has the highest accuracy of title classification. The study suggested the need for improving the accuracy of title classification, the need for research on book titles, tokenization of titles, and stop words.

Accuracy of Phishing Websites Detection Algorithms by Using Three Ranking Techniques

  • Mohammed, Badiea Abdulkarem;Al-Mekhlafi, Zeyad Ghaleb
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.272-282
    • /
    • 2022
  • Between 2014 and 2019, the US lost more than 2.1 billion USD to phishing attacks, according to the FBI's Internet Crime Complaint Center, and COVID-19 scam complaints totaled more than 1,200. Phishing attacks reflect these awful effects. Phishing websites (PWs) detection appear in the literature. Previous methods included maintaining a centralized blacklist that is manually updated, but newly created pseudonyms cannot be detected. Several recent studies utilized supervised machine learning (SML) algorithms and schemes to manipulate the PWs detection problem. URL extraction-based algorithms and schemes. These studies demonstrate that some classification algorithms are more effective on different data sets. However, for the phishing site detection problem, no widely known classifier has been developed. This study is aimed at identifying the features and schemes of SML that work best in the face of PWs across all publicly available phishing data sets. The Scikit Learn library has eight widely used classification algorithms configured for assessment on the public phishing datasets. Eight was tested. Later, classification algorithms were used to measure accuracy on three different datasets for statistically significant differences, along with the Welch t-test. Assemblies and neural networks outclass classical algorithms in this study. On three publicly accessible phishing datasets, eight traditional SML algorithms were evaluated, and the results were calculated in terms of classification accuracy and classifier ranking as shown in tables 4 and 8. Eventually, on severely unbalanced datasets, classifiers that obtained higher than 99.0 percent classification accuracy. Finally, the results show that this could also be adapted and outperforms conventional techniques with good precision.

Prediction of East Asian Brain Age using Machine Learning Algorithms Trained With Community-based Healthy Brain MRI

  • Chanda Simfukwe;Young Chul Youn
    • Dementia and Neurocognitive Disorders
    • /
    • v.21 no.4
    • /
    • pp.138-146
    • /
    • 2022
  • Background and Purpose: Magnetic resonance imaging (MRI) helps with brain development analysis and disease diagnosis. Brain volumes measured from different ages using MRI provides useful information in clinical evaluation and research. Therefore, we trained machine learning models that predict the brain age gap of healthy subjects in the East Asian population using T1 brain MRI volume images. Methods: In total, 154 T1-weighted MRIs of healthy subjects (55-83 years of age) were collected from an East Asian community. The information of age, gender, and education level was collected for each participant. The MRIs of the participants were preprocessed using FreeSurfer(https://surfer.nmr.mgh.harvard.edu/) to collect the brain volume data. We trained the models using different supervised machine learning regression algorithms from the scikit-learn (https://scikit-learn.org/) library. Results: The trained models comprised 19 features that had been reduced from 55 brain volume labels. The algorithm BayesianRidge (BR) achieved a mean absolute error (MAE) and r squared (R2) of 3 and 0.3 years, respectively, in predicting the age of the new subjects compared to other regression methods. The results of feature importance analysis showed that the right pallidum, white matter hypointensities on T1-MRI scans, and left hippocampus comprise some of the essential features in predicting brain age. Conclusions: The MAE and R2 accuracies of the BR model predicting brain age gap in the East Asian population showed that the model could reduce the dimensionality of neuroimaging data to provide a meaningful biomarker for individual brain aging.

Development of Multilayer Perceptron Model for the Prediction of Alcohol Concentration of Makgeolli

  • Kim, JoonYong;Rho, Shin-Joung;Cho, Yun Sung;Cho, EunSun
    • Journal of Biosystems Engineering
    • /
    • v.43 no.3
    • /
    • pp.229-236
    • /
    • 2018
  • Purpose: Makgeolli is a traditional alcoholic beverage made from rice with a fermentation starter called "nuruk." The concentration of alcohol in makgeolli depends on the temperature of the fermentation tank. It is important to monitor the alcohol concentration to manage the makgeolli production process. Methods: Data were collected from 84 makgeolli fermentation tanks over a year period. Independent variables included the temperatures of the tanks and the room where the tanks were located, as well as the quantity, acidity, and water concentration of the source. Software for the multilayer perceptron model (MLP) was written in Python using the Scikit-learn library. Results: Many models were created for which the optimization converged within 100 iterations, and their coefficients of determination $R^2$ were considerably high. The coefficient of determination $R^2$ of the best model with the training set and the test set were 0.94 and 0.93, respectively. The fact that the difference between them was very small indicated that the model was not overfitted. The maximum and minimum error was approximately 2% and the total MSE was 0.078%. Conclusions: The MLP model could help predict the alcohol concentration and to control the production process of makgeolli. In future research, the optimization of the production process will be studied based on the model.

Pig Image Learning for Improving Weight Measurement Accuracy

  • Jonghee Lee;Seonwoo Park;Gipou Nam;Jinwook Jang;Sungho Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.7
    • /
    • pp.33-40
    • /
    • 2024
  • The live weight of livestock is important information for managing their health and housing conditions, and it can be used to determine the optimal amount of feed and the timing of shipment. In general, it takes a lot of human resources and time to weigh livestock using a scale, and it is not easy to measure each stage of growth, which prevents effective breeding methods such as feeding amount control from being applied. In this paper, we aims to improve the accuracy of weight measurement of piglets, weaned pigs, nursery pigs, and fattening pigs by collecting, analyzing, learning, and predicting video and image data in animal husbandry and pig farming. For this purpose, we trained using Pytorch, YOLO(you only look once) 5 model, and Scikit Learn library and found that the actual and prediction graphs showed a similar flow with a of RMSE(root mean square error) 0.4%. and MAPE(mean absolute percentage error) 0.2%. It can be utilized in the mammalian pig, weaning pig, nursery pig, and fattening pig sections. The accuracy is expected to be continuously improved based on variously trained image and video data and actual measured weight data. It is expected that efficient breeding management will be possible by predicting the production of pigs by part through video reading in the future.

A Machine Learning-Based Encryption Behavior Cognitive Technique for Ransomware Detection (랜섬웨어 탐지를 위한 머신러닝 기반 암호화 행위 감지 기법)

  • Yoon-Cheol Hwang
    • Journal of Industrial Convergence
    • /
    • v.21 no.12
    • /
    • pp.55-62
    • /
    • 2023
  • Recent ransomware attacks employ various techniques and pathways, posing significant challenges in early detection and defense. Consequently, the scale of damage is continually growing. This paper introduces a machine learning-based approach for effective ransomware detection by focusing on file encryption and encryption patterns, which are pivotal functionalities utilized by ransomware. Ransomware is identified by analyzing password behavior and encryption patterns, making it possible to detect specific ransomware variants and new types of ransomware, thereby mitigating ransomware attacks effectively. The proposed machine learning-based encryption behavior detection technique extracts encryption and encryption pattern characteristics and trains them using a machine learning classifier. The final outcome is an ensemble of results from two classifiers. The classifier plays a key role in determining the presence or absence of ransomware, leading to enhanced accuracy. The proposed technique is implemented using the numpy, pandas, and Python's Scikit-Learn library. Evaluation indicators reveal an average accuracy of 94%, precision of 95%, recall rate of 93%, and an F1 score of 95%. These performance results validate the feasibility of ransomware detection through encryption behavior analysis, and further research is encouraged to enhance the technique for proactive ransomware detection.