Volume 1 Issue 2
-
In this paper, we present a new method for classifying malicious URLs to reduce cases of learning difficulties due to unfamiliar and difficult terms related to information protection. This study plans to extract only visually distinguishable features within the URL structure and compare them through map learning algorithms, and to compare the contribution values of the best map learning algorithm methods to extract features that have the most impact on classifying malicious URLs. As research data, Kaggle used data that classified 7,046 malicious URLs and 7.046 normal URLs. As a result of the study, among the three supervised learning algorithms used (Decision Tree, Support Vector Machine, and Logistic Regression), the Decision Tree algorithm showed the best performance with 83% accuracy, 83.1% F1-score and 83.6% Recall values. It was confirmed that the contribution value of https is the highest among whether to use https, sub domain, and prefix and suffix, which can be visually distinguished through the feature contribution of Decision Tree. Although it has been difficult to learn unfamiliar and difficult terms so far, this study will be able to provide an intuitive judgment method without explanation of the terms and prove its usefulness in the field of malicious URL detection.
-
In this paper, Depression is a mental disorder characterized by a lack of enthusiasm and feelings of sadness, which significantly impairs daily functioning. In 2018, there was an increase in book sales in the essay genre, particularly the popularity of "healing essays." This trend is seen as challenging the negative image and prejudices associated with depression. In 2021, a significant rise in the proportion of 20-year-old patients with depression is attributed to factors like job-related stress, interpersonal issues, and financial burdens. Additionally, there is a strong correlation between depression and suicidal thoughts, particularly among individuals who have experienced feelings of depression. Despite the increasing prevalence of depression among young adults, research in this area is lacking. To address this gap, statistical tools such as logistic regression and chi-squared tests are employed. The analysis reveals various independent variables associated with feelings of depression, shedding light on the relationships between these factors.
-
This study aims to enhance the accuracy of fine dust predictions by analyzing various factors within the local environment, in addition to atmospheric conditions. In the atmospheric environment, meteorological and air pollution data were utilized, and additional factors contributing to fine dust generation within the region, such as traffic volume and electricity transaction data, were sequentially incorporated for analysis. XGBoost, Random Forest, and ANN (Artificial Neural Network) were employed for the analysis. As variables were added, all algorithms demonstrated improved performance. Particularly noteworthy was the Artificial Neural Network, which, when using atmospheric conditions as a variable, resulted in an MAE of 6.25. Upon the addition of traffic volume, the MAE decreased to 5.49, and further inclusion of power transaction data led to a notable improvement, resulting in an MAE of 4.61. This research provides valuable insights for proactive measures against air pollution by predicting future fine dust levels.
-
The purpose of this study was to compare the performance using multiple regression models to predict the energy consumption of steel industry. Specific independent variables were selected in consideration of correlation among various attributes such as CO2 concentration, NSM, Week Status, Day of week, and Load Type, and preprocessing was performed to solve the multicollinearity problem. In data preprocessing, we evaluated linear and nonlinear relationships between each attribute through correlation analysis. In particular, we decided to select variables with high correlation and include appropriate variables in the final model to prevent multicollinearity problems. Among the many regression models learned, Boosted Decision Tree Regression showed the best predictive performance. Ensemble learning in this model was able to effectively learn complex patterns while preventing overfitting by combining multiple decision trees. Consequently, these predictive models are expected to provide important information for improving energy efficiency and management decision-making at steel industry. In the future, we plan to improve the performance of the model by collecting more data and extending variables, and the application of the model considering interactions with external factors will also be considered.