• Title/Summary/Keyword: Random sets

Search Result 278, Processing Time 0.029 seconds

Classifying Social Media Users' Stance: Exploring Diverse Feature Sets Using Machine Learning Algorithms

  • Kashif Ayyub;Muhammad Wasif Nisar;Ehsan Ullah Munir;Muhammad Ramzan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.79-88
    • /
    • 2024
  • The use of the social media has become part of our daily life activities. The social web channels provide the content generation facility to its users who can share their views, opinions and experiences towards certain topics. The researchers are using the social media content for various research areas. Sentiment analysis, one of the most active research areas in last decade, is the process to extract reviews, opinions and sentiments of people. Sentiment analysis is applied in diverse sub-areas such as subjectivity analysis, polarity detection, and emotion detection. Stance classification has emerged as a new and interesting research area as it aims to determine whether the content writer is in favor, against or neutral towards the target topic or issue. Stance classification is significant as it has many research applications like rumor stance classifications, stance classification towards public forums, claim stance classification, neural attention stance classification, online debate stance classification, dialogic properties stance classification etc. This research study explores different feature sets such as lexical, sentiment-specific, dialog-based which have been extracted using the standard datasets in the relevant area. Supervised learning approaches of generative algorithms such as Naïve Bayes and discriminative machine learning algorithms such as Support Vector Machine, Naïve Bayes, Decision Tree and k-Nearest Neighbor have been applied and then ensemble-based algorithms like Random Forest and AdaBoost have been applied. The empirical based results have been evaluated using the standard performance measures of Accuracy, Precision, Recall, and F-measures.

$A^{13}$ CNMR Determination of Monomer Composition in EP Copolymers, EPB and EPDM Terpolymers (EP 공중합체, EPB 및 EPDM 삼중합체의 단량체조성에 관한 $^{13}C$-NMR 분석)

  • Lee, Kang-Bong;An, Seong-Uk;Rhee, Jae-Seong;Kweon, Jeehye;Choi, Young-Sang
    • Analytical Science and Technology
    • /
    • v.7 no.1
    • /
    • pp.91-102
    • /
    • 1994
  • The monomer compositions in a series of propylene heterophasic copolymer, propylene random copolymer, propylene random terpolymer and ethylene-propylene-ENB terpolymer have been determined from $^{13}C-NMR$ spectra. The simplified and highly resolved $^{13}C-NMR$ spectra made it possible to assign unambiguousely and calculate the monomer composition. A complete sets of NMR chemical shift assignments and the way to measure the quantity of monomer are newly given in diverse polymers. Furthermore complete dyad, triad, tetrad and pentad distributions have been able to be determined. These NMR quantitative analytical results for monomer compostition have consistent with those from Infrared spectral data.

  • PDF

Chatting Pattern Based Game BOT Detection: Do They Talk Like Us?

  • Kang, Ah Reum;Kim, Huy Kang;Woo, Jiyoung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.11
    • /
    • pp.2866-2879
    • /
    • 2012
  • Among the various security threats in online games, the use of game bots is the most serious problem. Previous studies on game bot detection have proposed many methods to find out discriminable behaviors of bots from humans based on the fact that a bot's playing pattern is different from that of a human. In this paper, we look at the chatting data that reflects gamers' communication patterns and propose a communication pattern analysis framework for online game bot detection. In massive multi-user online role playing games (MMORPGs), game bots use chatting message in a different way from normal users. We derive four features; a network feature, a descriptive feature, a diversity feature and a text feature. To measure the diversity of communication patterns, we propose lightly summarized indices, which are computationally inexpensive and intuitive. For text features, we derive lexical, syntactic and semantic features from chatting contents using text mining techniques. To build the learning model for game bot detection, we test and compare three classification models: the random forest, logistic regression and lazy learning. We apply the proposed framework to AION operated by NCsoft, a leading online game company in Korea. As a result of our experiments, we found that the random forest outperforms the logistic regression and lazy learning. The model that employs the entire feature sets gives the highest performance with a precision value of 0.893 and a recall value of 0.965.

Analysis of facial expression recognition (표정 분류 연구)

  • Son, Nayeong;Cho, Hyunsun;Lee, Sohyun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.5
    • /
    • pp.539-554
    • /
    • 2018
  • Effective interaction between user and device is considered an important ability of IoT devices. For some applications, it is necessary to recognize human facial expressions in real time and make accurate judgments in order to respond to situations correctly. Therefore, many researches on facial image analysis have been preceded in order to construct a more accurate and faster recognition system. In this study, we constructed an automatic recognition system for facial expressions through two steps - a facial recognition step and a classification step. We compared various models with different sets of data with pixel information, landmark coordinates, Euclidean distances among landmark points, and arctangent angles. We found a fast and efficient prediction model with only 30 principal components of face landmark information. We applied several prediction models, that included linear discriminant analysis (LDA), random forests, support vector machine (SVM), and bagging; consequently, an SVM model gives the best result. The LDA model gives the second best prediction accuracy but it can fit and predict data faster than SVM and other methods. Finally, we compared our method to Microsoft Azure Emotion API and Convolution Neural Network (CNN). Our method gives a very competitive result.

Parameter Estimation in Debris Flow Deposition Model Using Pseudo Sample Neural Network (의사 샘플 신경망을 이용한 토석류 퇴적 모델의 파라미터 추정)

  • Heo, Gyeongyong;Lee, Chang-Woo;Park, Choong-Shik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.11
    • /
    • pp.11-18
    • /
    • 2012
  • Debris flow deposition model is a model to predict affected areas by debris flow and random walk model (RWM) was used to build the model. Although the model was proved to be effective in the prediction of affected areas, the model has several free parameters decided experimentally. There are several well-known methods to estimate parameters, however, they cannot be applied directly to the debris flow problem due to the small size of training data. In this paper, a modified neural network, called pseudo sample neural network (PSNN), was proposed to overcome the sample size problem. In the training phase, PSNN uses pseudo samples, which are generated using the existing samples. The pseudo samples smooth the solution space and reduce the probability of falling into a local optimum. As a result, PSNN can estimate parameter more robustly than traditional neural networks do. All of these can be proved through the experiments using artificial and real data sets.

Fluid Flow and Solute Transport in a Discrete Fracture Network Model with Nonlinear Hydromechanical Effect (비선형 hydromechanic 효과를 고려한 이산 균열망 모형에서의 유체흐름과 오염물질 이송에 관한 수치모의 실험)

  • Jeong, U-Chang
    • Journal of Korea Water Resources Association
    • /
    • v.31 no.3
    • /
    • pp.347-360
    • /
    • 1998
  • Numerical simulations for fluid flow and solute transport in a fracture rock masses are performed by using a transient flow model, which is based on the three-dimensional stochastic and discrete fracture network model (DFN model) and is coupled hydraulic model with mechanical model. In the numerical simulations of the solute transport, we used to the particle following algorithm which is similar to an advective biased random walk. The purpose of this study is to predict the response of the tracer test between two deep bore holes (GPK1 and GPK2) implanted at Soultz sous Foret in France, in the context of the geothermal researches.l The data sets used are obtained from in situcirculating experiments during 1995. As the result of the transport simulation, the mean transit time for the non reactive particles is about 5 days between two bore holes.

  • PDF

A comparative study of the performance of machine learning algorithms to detect malicious traffic in IoT networks (IoT 네트워크에서 악성 트래픽을 탐지하기 위한 머신러닝 알고리즘의 성능 비교연구)

  • Hyun, Mi-Jin
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.463-468
    • /
    • 2021
  • Although the IoT is showing explosive growth due to the development of technology and the spread of IoT devices and activation of services, serious security risks and financial damage are occurring due to the activities of various botnets. Therefore, it is important to accurately and quickly detect the activities of these botnets. As security in the IoT environment has characteristics that require operation with minimum processing performance and memory, in this paper, the minimum characteristics for detection are selected, and KNN (K-Nearest Neighbor), Naïve Bayes, Decision Tree, Random A comparative study was conducted on the performance of machine learning algorithms such as Forest to detect botnet activity. Experimental results using the Bot-IoT dataset showed that KNN can detect DDoS, DoS, and Reconnaissance attacks most effectively and efficiently among the applied machine learning algorithms.

Predicting Surgical Complications in Adult Patients Undergoing Anterior Cervical Discectomy and Fusion Using Machine Learning

  • Arvind, Varun;Kim, Jun S.;Oermann, Eric K.;Kaji, Deepak;Cho, Samuel K.
    • Neurospine
    • /
    • v.15 no.4
    • /
    • pp.329-337
    • /
    • 2018
  • Objective: Machine learning algorithms excel at leveraging big data to identify complex patterns that can be used to aid in clinical decision-making. The objective of this study is to demonstrate the performance of machine learning models in predicting postoperative complications following anterior cervical discectomy and fusion (ACDF). Methods: Artificial neural network (ANN), logistic regression (LR), support vector machine (SVM), and random forest decision tree (RF) models were trained on a multicenter data set of patients undergoing ACDF to predict surgical complications based on readily available patient data. Following training, these models were compared to the predictive capability of American Society of Anesthesiologists (ASA) physical status classification. Results: A total of 20,879 patients were identified as having undergone ACDF. Following exclusion criteria, patients were divided into 14,615 patients for training and 6,264 for testing data sets. ANN and LR consistently outperformed ASA physical status classification in predicting every complication (p < 0.05). The ANN outperformed LR in predicting venous thromboembolism, wound complication, and mortality (p < 0.05). The SVM and RF models were no better than random chance at predicting any of the postoperative complications (p < 0.05). Conclusion: ANN and LR algorithms outperform ASA physical status classification for predicting individual postoperative complications. Additionally, neural networks have greater sensitivity than LR when predicting mortality and wound complications. With the growing size of medical data, the training of machine learning on these large datasets promises to improve risk prognostication, with the ability of continuously learning making them excellent tools in complex clinical scenarios.

Detecting Spectre Malware Binary through Function Level N-gram Comparison (함수 단위 N-gram 비교를 통한 Spectre 공격 바이너리 식별 방법)

  • Kim, Moon-Sun;Yang, Hee-Dong;Kim, Kwang-Jun;Lee, Man-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1043-1052
    • /
    • 2020
  • Signature-based malicious code detection methods share a common limitation; it is very hard to detect modified malicious codes or new malware utilizing zero-day vulnerabilities. To overcome this limitation, many studies are actively carried out to classify malicious codes using N-gram. Although they can detect malicious codes with high accuracy, it is difficult to identify malicious codes that uses very short codes such as Spectre. We propose a function level N-gram comparison algorithm to effectively identify the Spectre binary. To test the validity of this algorithm, we built N-gram data sets from 165 normal binaries and 25 malignant binaries. When we used Random Forest models, the model performance experiments identified Spectre malicious functions with 99.99% accuracy and its f1-score was 92%.

Chinese-clinical-record Named Entity Recognition using IDCNN-BiLSTM-Highway Network

  • Tinglong Tang;Yunqiao Guo;Qixin Li;Mate Zhou;Wei Huang;Yirong Wu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.7
    • /
    • pp.1759-1772
    • /
    • 2023
  • Chinese named entity recognition (NER) is a challenging work that seeks to find, recognize and classify various types of information elements in unstructured text. Due to the Chinese text has no natural boundary like the spaces in the English text, Chinese named entity identification is much more difficult. At present, most deep learning based NER models are developed using a bidirectional long short-term memory network (BiLSTM), yet the performance still has some space to improve. To further improve their performance in Chinese NER tasks, we propose a new NER model, IDCNN-BiLSTM-Highway, which is a combination of the BiLSTM, the iterated dilated convolutional neural network (IDCNN) and the highway network. In our model, IDCNN is used to achieve multiscale context aggregation from a long sequence of words. Highway network is used to effectively connect different layers of networks, allowing information to pass through network layers smoothly without attenuation. Finally, the global optimum tag result is obtained by introducing conditional random field (CRF). The experimental results show that compared with other popular deep learning-based NER models, our model shows superior performance on two Chinese NER data sets: Resume and Yidu-S4k, The F1-scores are 94.98 and 77.59, respectively.