• Title/Summary/Keyword: Regression algorithm

Search Result 1,068, Processing Time 0.028 seconds

An Ensemble Classification of Mental Health in Malaysia related to the Covid-19 Pandemic using Social Media Sentiment Analysis

  • Nur 'Aisyah Binti Zakaria Adli;Muneer Ahmad;Norjihan Abdul Ghani;Sri Devi Ravana;Azah Anir Norman
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.370-396
    • /
    • 2024
  • COVID-19 was declared a pandemic by the World Health Organization (WHO) on 30 January 2020. The lifestyle of people all over the world has changed since. In most cases, the pandemic has appeared to create severe mental disorders, anxieties, and depression among people. Mostly, the researchers have been conducting surveys to identify the impacts of the pandemic on the mental health of people. Despite the better quality, tailored, and more specific data that can be generated by surveys,social media offers great insights into revealing the impact of the pandemic on mental health. Since people feel connected on social media, thus, this study aims to get the people's sentiments about the pandemic related to mental issues. Word Cloud was used to visualize and identify the most frequent keywords related to COVID-19 and mental health disorders. This study employs Majority Voting Ensemble (MVE) classification and individual classifiers such as Naïve Bayes (NB), Support Vector Machine (SVM), and Logistic Regression (LR) to classify the sentiment through tweets. The tweets were classified into either positive, neutral, or negative using the Valence Aware Dictionary or sEntiment Reasoner (VADER). Confusion matrix and classification reports bestow the precision, recall, and F1-score in identifying the best algorithm for classifying the sentiments.

Prediction of Venous Trans-Stenotic Pressure Gradient Using Shape Features Derived From Magnetic Resonance Venography in Idiopathic Intracranial Hypertension Patients

  • Chao Ma;Haoyu Zhu;Shikai Liang;Yuzhou Chang;Dapeng Mo;Chuhan Jiang;Yupeng Zhang
    • Korean Journal of Radiology
    • /
    • v.25 no.1
    • /
    • pp.74-85
    • /
    • 2024
  • Objective: Idiopathic intracranial hypertension (IIH) is a condition of unknown etiology associated with venous sinus stenosis. This study aimed to develop a magnetic resonance venography (MRV)-based radiomics model for predicting a high trans-stenotic pressure gradient (TPG) in IIH patients diagnosed with venous sinus stenosis. Materials and Methods: This retrospective study included 105 IIH patients (median age [interquartile range], 35 years [27-42 years]; female:male, 82:23) who underwent MRV and catheter venography complemented by venous manometry. Contrast enhanced-MRV was conducted under 1.5 Tesla system, and the images were reconstructed using a standard algorithm. Shape features were derived from MRV images via the PyRadiomics package and selected by utilizing the least absolute shrinkage and selection operator (LASSO) method. A radiomics score for predicting high TPG (≥ 8 mmHg) in IIH patients was formulated using multivariable logistic regression; its discrimination performance was assessed using the area under the receiver operating characteristic curve (AUROC). A nomogram was constructed by incorporating the radiomics scores and clinical features. Results: Data from 105 patients were randomly divided into two distinct datasets for model training (n = 73; 50 and 23 with and without high TPG, respectively) and testing (n = 32; 22 and 10 with and without high TPG, respectively). Three informative shape features were identified in the training datasets: least axis length, sphericity, and maximum three-dimensional diameter. The radiomics score for predicting high TPG in IIH patients demonstrated an AUROC of 0.906 (95% confidence interval, 0.836-0.976) in the training dataset and 0.877 (95% confidence interval, 0.755-0.999) in the test dataset. The nomogram showed good calibration. Conclusion: Our study presents the feasibility of a novel model for predicting high TPG in IIH patients using radiomics analysis of noninvasive MRV-based shape features. This information may aid clinicians in identifying patients who may benefit from stenting.

Clinicoradiological Characteristics in the Differential Diagnosis of Follicular-Patterned Lesions of the Thyroid: A Multicenter Cohort Study

  • Jeong Hoon Lee;Eun Ju Ha;Da Hyun Lee;Miran Han;Jung Hyun Park;Ji-hoon Kim
    • Korean Journal of Radiology
    • /
    • v.23 no.7
    • /
    • pp.763-772
    • /
    • 2022
  • Objective: Preoperative differential diagnosis of follicular-patterned lesions is challenging. This multicenter cohort study investigated the clinicoradiological characteristics relevant to the differential diagnosis of such lesions. Materials and Methods: From June to September 2015, 4787 thyroid nodules (≥ 1.0 cm) with a final diagnosis of benign follicular nodule (BN, n = 4461), follicular adenoma (FA, n = 136), follicular carcinoma (FC, n = 62), or follicular variant of papillary thyroid carcinoma (FVPTC, n = 128) collected from 26 institutions were analyzed. The clinicoradiological characteristics of the lesions were compared among the different histological types using multivariable logistic regression analyses. The relative importance of the characteristics that distinguished histological types was determined using a random forest algorithm. Results: Compared to BN (as the control group), the distinguishing features of follicular-patterned neoplasms (FA, FC, and FVPTC) were patient's age (odds ratio [OR], 0.969 per 1-year increase), lesion diameter (OR, 1.054 per 1-mm increase), presence of solid composition (OR, 2.255), presence of hypoechogenicity (OR, 2.181), and presence of halo (OR, 1.761) (all p < 0.05). Compared to FA (as the control), FC differed with respect to lesion diameter (OR, 1.040 per 1-mm increase) and rim calcifications (OR, 17.054), while FVPTC differed with respect to patient age (OR, 0.966 per 1-year increase), lesion diameter (OR, 0.975 per 1-mm increase), macrocalcifications (OR, 3.647), and non-smooth margins (OR, 2.538) (all p < 0.05). The five important features for the differential diagnosis of follicular-patterned neoplasms (FA, FC, and FVPTC) from BN are maximal lesion diameter, composition, echogenicity, orientation, and patient's age. The most important features distinguishing FC and FVPTC from FA are rim calcifications and macrocalcifications, respectively. Conclusion: Although follicular-patterned lesions have overlapping clinical and radiological features, the distinguishing features identified in our large clinical cohort may provide valuable information for preoperative distinction between them and decision-making regarding their management.

Protecting Accounting Information Systems using Machine Learning Based Intrusion Detection

  • Biswajit Panja
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.5
    • /
    • pp.111-118
    • /
    • 2024
  • In general network-based intrusion detection system is designed to detect malicious behavior directed at a network or its resources. The key goal of this paper is to look at network data and identify whether it is normal traffic data or anomaly traffic data specifically for accounting information systems. In today's world, there are a variety of principles for detecting various forms of network-based intrusion. In this paper, we are using supervised machine learning techniques. Classification models are used to train and validate data. Using these algorithms we are training the system using a training dataset then we use this trained system to detect intrusion from the testing dataset. In our proposed method, we will detect whether the network data is normal or an anomaly. Using this method we can avoid unauthorized activity on the network and systems under that network. The Decision Tree and K-Nearest Neighbor are applied to the proposed model to classify abnormal to normal behaviors of network traffic data. In addition to that, Logistic Regression Classifier and Support Vector Classification algorithms are used in our model to support proposed concepts. Furthermore, a feature selection method is used to collect valuable information from the dataset to enhance the efficiency of the proposed approach. Random Forest machine learning algorithm is used, which assists the system to identify crucial aspects and focus on them rather than all the features them. The experimental findings revealed that the suggested method for network intrusion detection has a neglected false alarm rate, with the accuracy of the result expected to be between 95% and 100%. As a result of the high precision rate, this concept can be used to detect network data intrusion and prevent vulnerabilities on the network.

The Analysis of Research Trends in Electric Vehicle using Topic Modeling (토픽 모델링을 이용한 전기차 연구 동향 분석)

  • Yuan Chen;Seok-Swoo Cho
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.4
    • /
    • pp.255-265
    • /
    • 2024
  • To address environmental challenges and improve energy efficiency, the adoption of electric vehicles has led to a surge in related research. However, to comprehensively understand the research trends within the field of electric vehicles, it is necessary to systematically analyze vast amounts of data. This study systematically analyzed research trends in the field of electric vehicles and identified key research topics through LDA topic modeling, based on 36,519 papers related to electric vehicles collected from the SCIE database. The data analysis revealed a total of 10 major topics, of which three were identified as hot topics showing an upward trend: Electric Vehicle Charging Infrastructure, Energy and Environmental Policy, and Optimization and Algorithms. Conversely, five topics were identified as cold topics exhibiting a downward trend: Battery Temperature and Cooling, Battery Materials and Chemistry, Motor and Mechanical Design, Control Strategies and Systems, and Battery Components and Materials. This study provides basic data for understanding the current research trends in electric vehicles and offers valuable information for researchers in selecting research topics related to electric vehicles.

Corpus of Eye Movements in L3 Spanish Reading: A Prediction Model

  • Hui-Chuan Lu;Li-Chi Kao;Zong-Han Li;Wen-Hsiang Lu;An-Chung Cheng
    • Asia Pacific Journal of Corpus Research
    • /
    • v.5 no.1
    • /
    • pp.23-36
    • /
    • 2024
  • This research centers on the Taiwan Eye-Movement Corpus of Spanish (TECS), a specially created corpus comprising eye-tracking data from Chinese-speaking learners of Spanish as a third language in Taiwan. Its primary purpose is to explore the broad utility of TECS in understanding language learning processes, particularly the initial stages of language learning. Constructing this corpus involves gathering data on eye-tracking, reading comprehension, and language proficiency to develop a machine-learning model that predicts learner behaviors, and subsequently undergoes a predictability test for validation. The focus is on examining attention in input processing and their relationship to language learning outcomes. The TECS eye-tracking data consists of indicators derived from eye movement recordings while reading Spanish sentences with temporal references. These indicators are obtained from eye movement experiments focusing on tense verbal inflections and temporal adverbs. Chinese expresses tense using aspect markers, lexical references, and contextual cues, differing significantly from inflectional languages like Spanish. Chinese-speaking learners of Spanish face particular challenges in learning verbal morphology and tenses. The data from eye movement experiments were structured into feature vectors, with learner behaviors serving as class labels. After categorizing the collected data, we used two types of machine learning methods for classification and regression: Random Forests and the k-nearest neighbors algorithm (KNN). By leveraging these algorithms, we predicted learner behaviors and conducted performance evaluations to enhance our understanding of the nexus between learner behaviors and language learning process. Future research may further enrich TECS by gathering data from subsequent eye-movement experiments, specifically targeting various Spanish tenses and temporal lexical references during text reading. These endeavors promise to broaden and refine the corpus, advancing our understanding of language processing.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Simultaneous estimation of fatty acids contents from soybean seeds using fourier transform infrared spectroscopy and gas chromatography by multivariate analysis (적외선 분광스펙트럼 및 기체크로마토그라피 분석 데이터의 다변량 통계분석을 이용한 대두 종자 지방산 함량예측)

  • Ahn, Myung Suk;Ji, Eun Yee;Song, Seung Yeob;Ahn, Joon Woo;Jeong, Won Joong;Min, Sung Ran;Kim, Suk Weon
    • Journal of Plant Biotechnology
    • /
    • v.42 no.1
    • /
    • pp.60-70
    • /
    • 2015
  • The aim of this study was to investigate whether fourier transform infrared (FT-IR) spectroscopy can be applied to simultaneous determination of fatty acids contents in different soybean cultivars. Total 153 lines of soybean (Glycine max Merrill) were examined by FT-IR spectroscopy. Quantification of fatty acids from the soybean lines was confirmed by quantitative gas chromatography (GC) analysis. The quantitative spectral variation among different soybean lines was observed in the amide bond region ($1,700{\sim}1,500cm^{-1}$), phosphodiester groups ($1,500{\sim}1,300cm^{-1}$) and sugar region ($1,200{\sim}1,000cm^{-1}$) of FT-IR spectra. The quantitative prediction modeling of 5 individual fatty acids contents (palmitic acid, stearic acid, oleic acid, linoleic acid, linolenic acid) from soybean lines were established using partial least square regression algorithm from FT-IR spectra. In cross validation, there were high correlations ($R^2{\geq}0.97$) between predicted content of 5 individual fatty acids by PLS regression modeling from FT-IR spectra and measured content by GC. In external validation, palmitic acid ($R^2=0.8002$), oleic acid ($R^2=0.8909$) and linoleic acid ($R^2=0.815$) were predicted with good accuracy, while prediction for stearic acid ($R^2=0.4598$), linolenic acid ($R^2=0.6868$) had relatively lower accuracy. These results clearly show that FT-IR spectra combined with multivariate analysis can be used to accurately predict fatty acids contents in soybean lines. Therefore, we suggest that the PLS prediction system for fatty acid contents using FT-IR analysis could be applied as a rapid and high throughput screening tool for the breeding for modified Fatty acid composition in soybean and contribute to accelerating the conventional breeding.

A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection (입력변수 및 학습사례 선정을 동시에 최적화하는 GA-MSVM 기반 주가지수 추세 예측 모형에 관한 연구)

  • Lee, Jong-sik;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.147-168
    • /
    • 2017
  • There have been many studies on accurate stock market forecasting in academia for a long time, and now there are also various forecasting models using various techniques. Recently, many attempts have been made to predict the stock index using various machine learning methods including Deep Learning. Although the fundamental analysis and the technical analysis method are used for the analysis of the traditional stock investment transaction, the technical analysis method is more useful for the application of the short-term transaction prediction or statistical and mathematical techniques. Most of the studies that have been conducted using these technical indicators have studied the model of predicting stock prices by binary classification - rising or falling - of stock market fluctuations in the future market (usually next trading day). However, it is also true that this binary classification has many unfavorable aspects in predicting trends, identifying trading signals, or signaling portfolio rebalancing. In this study, we try to predict the stock index by expanding the stock index trend (upward trend, boxed, downward trend) to the multiple classification system in the existing binary index method. In order to solve this multi-classification problem, a technique such as Multinomial Logistic Regression Analysis (MLOGIT), Multiple Discriminant Analysis (MDA) or Artificial Neural Networks (ANN) we propose an optimization model using Genetic Algorithm as a wrapper for improving the performance of this model using Multi-classification Support Vector Machines (MSVM), which has proved to be superior in prediction performance. In particular, the proposed model named GA-MSVM is designed to maximize model performance by optimizing not only the kernel function parameters of MSVM, but also the optimal selection of input variables (feature selection) as well as instance selection. In order to verify the performance of the proposed model, we applied the proposed method to the real data. The results show that the proposed method is more effective than the conventional multivariate SVM, which has been known to show the best prediction performance up to now, as well as existing artificial intelligence / data mining techniques such as MDA, MLOGIT, CBR, and it is confirmed that the prediction performance is better than this. Especially, it has been confirmed that the 'instance selection' plays a very important role in predicting the stock index trend, and it is confirmed that the improvement effect of the model is more important than other factors. To verify the usefulness of GA-MSVM, we applied it to Korea's real KOSPI200 stock index trend forecast. Our research is primarily aimed at predicting trend segments to capture signal acquisition or short-term trend transition points. The experimental data set includes technical indicators such as the price and volatility index (2004 ~ 2017) and macroeconomic data (interest rate, exchange rate, S&P 500, etc.) of KOSPI200 stock index in Korea. Using a variety of statistical methods including one-way ANOVA and stepwise MDA, 15 indicators were selected as candidate independent variables. The dependent variable, trend classification, was classified into three states: 1 (upward trend), 0 (boxed), and -1 (downward trend). 70% of the total data for each class was used for training and the remaining 30% was used for verifying. To verify the performance of the proposed model, several comparative model experiments such as MDA, MLOGIT, CBR, ANN and MSVM were conducted. MSVM has adopted the One-Against-One (OAO) approach, which is known as the most accurate approach among the various MSVM approaches. Although there are some limitations, the final experimental results demonstrate that the proposed model, GA-MSVM, performs at a significantly higher level than all comparative models.

The Effect of Mean Brightness and Contrast of Digital Image on Detection of Watermark Noise (워터 마크 잡음 탐지에 미치는 디지털 영상의 밝기와 대비의 효과)

  • Kham Keetaek;Moon Ho-Seok;Yoo Hun-Woo;Chung Chan-Sup
    • Korean Journal of Cognitive Science
    • /
    • v.16 no.4
    • /
    • pp.305-322
    • /
    • 2005
  • Watermarking is a widely employed method tn protecting copyright of a digital image, the owner's unique image is embedded into the original image. Strengthened level of watermark insertion would help enhance its resilience in the process of extraction even from various distortions of transformation on the image size or resolution. However, its level, at the same time, should be moderated enough not to reach human visibility. Finding a balance between these two is crucial in watermarking. For the algorithm for watermarking, the predefined strength of a watermark, computed from the physical difference between the original and embedded images, is applied to all images uniformal. The mean brightness or contrast of the surrounding images, other than the absolute brightness of an object, could affect human sensitivity for object detection. In the present study, we examined whether the detectability for watermark noise might be attired by image statistics: mean brightness and contrast of the image. As the first step to examine their effect, we made rune fundamental images with varied brightness and control of the original image. For each fundamental image, detectability for watermark noise was measured. The results showed that the strength ot watermark node for detection increased as tile brightness and contrast of the fundamental image were increased. We have fitted the data to a regression line which can be used to estimate the strength of watermark of a given image with a certain brightness and contrast. Although we need to take other required factors into consideration in directly applying this formula to actual watermarking algorithm, an adaptive watermarking algorithm could be built on this formula with image statistics, such as brightness and contrast.

  • PDF