• Title/Summary/Keyword: Unsupervised machine learning.

Search Result 139, Processing Time 0.019 seconds

Why Should I Ban You! : X-FDS (Explainable FDS) Model Based on Online Game Payment Log (X-FDS : 게임 결제 로그 기반 XAI적용 이상 거래탐지 모델 연구)

  • Lee, Young Hun;Kim, Huy Kang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.1
    • /
    • pp.25-38
    • /
    • 2022
  • With the diversification of payment methods and games, related financial accidents are causing serious problems for users and game companies. Recently, game companies have introduced an Fraud Detection System (FDS) for game payment systems to prevent financial incident. However, FDS is ineffective and cannot provide major evidence based on judgment results, as it requires constant change of detection patterns. In this paper, we analyze abnormal transactions among payment log data of real game companies to generate related features. One of the unsupervised learning models, Autoencoder, was used to build a model to detect abnormal transactions, which resulted in over 85% accuracy. Using X-FDS (Explainable FDS) with XAI-SHAP, we could understand that the variables with the highest explanation for anomaly detection were the amount of transaction, transaction medium, and the age of users. Based on X-FDS, we derive an improved detection model with an accuracy of 94% was finally derived by fine-tuning the importance of features that adversely affect the proposed model.

Malware Detection Using Deep Recurrent Neural Networks with no Random Initialization

  • Amir Namavar Jahromi;Sattar Hashemi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.177-189
    • /
    • 2023
  • Malware detection is an increasingly important operational focus in cyber security, particularly given the fast pace of such threats (e.g., new malware variants introduced every day). There has been great interest in exploring the use of machine learning techniques in automating and enhancing the effectiveness of malware detection and analysis. In this paper, we present a deep recurrent neural network solution as a stacked Long Short-Term Memory (LSTM) with a pre-training as a regularization method to avoid random network initialization. In our proposal, we use global and short dependencies of the inputs. With pre-training, we avoid random initialization and are able to improve the accuracy and robustness of malware threat hunting. The proposed method speeds up the convergence (in comparison to stacked LSTM) by reducing the length of malware OpCode or bytecode sequences. Hence, the complexity of our final method is reduced. This leads to better accuracy, higher Mattews Correlation Coefficients (MCC), and Area Under the Curve (AUC) in comparison to a standard LSTM with similar detection time. Our proposed method can be applied in real-time malware threat hunting, particularly for safety critical systems such as eHealth or Internet of Military of Things where poor convergence of the model could lead to catastrophic consequences. We evaluate the effectiveness of our proposed method on Windows, Ransomware, Internet of Things (IoT), and Android malware datasets using both static and dynamic analysis. For the IoT malware detection, we also present a comparative summary of the performance on an IoT-specific dataset of our proposed method and the standard stacked LSTM method. More specifically, of our proposed method achieves an accuracy of 99.1% in detecting IoT malware samples, with AUC of 0.985, and MCC of 0.95; thus, outperforming standard LSTM based methods in these key metrics.

A Study on Image Creation and Modification Techniques Using Generative Adversarial Neural Networks (생성적 적대 신경망을 활용한 부분 위변조 이미지 생성에 관한 연구)

  • Song, Seong-Heon;Choi, Bong-Jun;Moon, M-Ikyeong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.2
    • /
    • pp.291-298
    • /
    • 2022
  • A generative adversarial network (GAN) is a network in which two internal neural networks (generative network and discriminant network) learn while competing with each other. The generator creates an image close to reality, and the delimiter is programmed to better discriminate the image of the constructor. This technology is being used in various ways to create, transform, and restore the entire image X into another image Y. This paper describes a method that can be forged into another object naturally, after extracting only a partial image from the original image. First, a new image is created through the previously trained DCGAN model, after extracting only a partial image from the original image. The original image goes through a process of naturally combining with, after re-styling it to match the texture and size of the original image using the overall style transfer technique. Through this study, the user can naturally add/transform the desired object image to a specific part of the original image, so it can be used as another field of application for creating fake images.

Graph-Based Word Sense Disambiguation Using Iterative Approach (반복적 기법을 사용한 그래프 기반 단어 모호성 해소)

  • Kang, Sangwoo
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.13 no.2
    • /
    • pp.102-110
    • /
    • 2017
  • Current word sense disambiguation techniques employ various machine learning-based methods. Various approaches have been proposed to address this problem, including the knowledge base approach. This approach defines the sense of an ambiguous word in accordance with knowledge base information with no training corpus. In unsupervised learning techniques that use a knowledge base approach, graph-based and similarity-based methods have been the main research areas. The graph-based method has the advantage of constructing a semantic graph that delineates all paths between different senses that an ambiguous word may have. However, unnecessary semantic paths may be introduced, thereby increasing the risk of errors. To solve this problem and construct a fine-grained graph, in this paper, we propose a model that iteratively constructs the graph while eliminating unnecessary nodes and edges, i.e., senses and semantic paths. The hybrid similarity estimation model was applied to estimate a more accurate sense in the constructed semantic graph. Because the proposed model uses BabelNet, a multilingual lexical knowledge base, the model is not limited to a specific language.

Repeated K-means Clustering Algorithm For Radar Sorting (레이더 군집화를 위한 반복 K-means 클러스터링 알고리즘)

  • Dong Hyun ParK;Dong-ho Seo;Jee-hyeon Baek;Won-jin Lee;Dong Eui Chang
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.26 no.5
    • /
    • pp.384-391
    • /
    • 2023
  • In modern electronic warfare, a number of radar emitters are in operation, causing radar receivers to receive high-density signal pulses that occur simultaneously. To analyze the radar signals more accurately and identify enemies, the sorting process of high-density radar signals is very important before analysis. Recently, machine learning algorithms, specifically K-means clustering, are the subject of research aimed at improving the accuracy of radar signal sorting. One of the challenges faced by these studies is that the clustering results can vary depending on how the initial points are selected and how many clusters number are set. This paper introduces a repeated K-means clustering algorithm that aims to accurately cluster all data by identifying and addressing false clusters in the radar sorting problem. To verify the performance of the proposed algorithm, experiments are conducted by applying it to simulated signals that are generated by a signal generator.

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.

Deep Learning Approach for Automatic Discontinuity Mapping on 3D Model of Tunnel Face (터널 막장 3차원 지형모델 상에서의 불연속면 자동 매핑을 위한 딥러닝 기법 적용 방안)

  • Chuyen Pham;Hyu-Soung Shin
    • Tunnel and Underground Space
    • /
    • v.33 no.6
    • /
    • pp.508-518
    • /
    • 2023
  • This paper presents a new approach for the automatic mapping of discontinuities in a tunnel face based on its 3D digital model reconstructed by LiDAR scan or photogrammetry techniques. The main idea revolves around the identification of discontinuity areas in the 3D digital model of a tunnel face by segmenting its 2D projected images using a deep-learning semantic segmentation model called U-Net. The proposed deep learning model integrates various features including the projected RGB image, depth map image, and local surface properties-based images i.e., normal vector and curvature images to effectively segment areas of discontinuity in the images. Subsequently, the segmentation results are projected back onto the 3D model using depth maps and projection matrices to obtain an accurate representation of the location and extent of discontinuities within the 3D space. The performance of the segmentation model is evaluated by comparing the segmented results with their corresponding ground truths, which demonstrates the high accuracy of segmentation results with the intersection-over-union metric of approximately 0.8. Despite still being limited in training data, this method exhibits promising potential to address the limitations of conventional approaches, which only rely on normal vectors and unsupervised machine learning algorithms for grouping points in the 3D model into distinct sets of discontinuities.

Analysis of Characteristics of Clusters of Middle School Students Using K-Means Cluster Analysis (K-평균 군집분석을 활용한 중학생의 군집화 및 특성 분석)

  • Jaebong, Lee
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.6
    • /
    • pp.611-619
    • /
    • 2022
  • The purpose of this study is to explore the possibility of applying big data analysis to provide appropriate feedback to students using evaluation data in science education at a time when interest in educational data mining has recently increased in education. In this study, we use the evaluation data of 2,576 students who took 24 questions of the national assessment of educational achievement. And we use K-means cluster analysis as a method of unsupervised machine learning for clustering. As a result of clustering, students were divided into six clusters. The middle-ranking students are divided into various clusters when compared to upper or lower ranks. According to the results of the cluster analysis, the most important factor influencing clusterization is academic achievement, and each cluster shows different characteristics in terms of content domains, subject competencies, and affective characteristics. Learning motivation is important among the affective domains in the lower-ranking achievement cluster, and scientific inquiry and problem-solving competency, as well as scientific communication competency have a major influence in terms of subject competencies. In the content domain, achievement of motion and energy and matter are important factors to distinguish the characteristics of the cluster. As a result, we can provide students with customized feedback for learning based on the characteristics of each cluster. We discuss implications of these results for science education, such as the possibility of using this study results, balanced learning by content domains, enhancement of subject competency, and improvement of scientific attitude.

Empirical Research on Search model of Web Service Repository (웹서비스 저장소의 검색기법에 관한 실증적 연구)

  • Hwang, You-Sub
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.173-193
    • /
    • 2010
  • The World Wide Web is transitioning from being a mere collection of documents that contain useful information toward providing a collection of services that perform useful tasks. The emerging Web service technology has been envisioned as the next technological wave and is expected to play an important role in this recent transformation of the Web. By providing interoperable interface standards for application-to-application communication, Web services can be combined with component-based software development to promote application interaction and integration within and across enterprises. To make Web services for service-oriented computing operational, it is important that Web services repositories not only be well-structured but also provide efficient tools for an environment supporting reusable software components for both service providers and consumers. As the potential of Web services for service-oriented computing is becoming widely recognized, the demand for an integrated framework that facilitates service discovery and publishing is concomitantly growing. In our research, we propose a framework that facilitates Web service discovery and publishing by combining clustering techniques and leveraging the semantics of the XML-based service specification in WSDL files. We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the Web service domain. We have developed a Web service discovery tool based on the proposed approach using an unsupervised artificial neural network and empirically evaluated the proposed approach and tool using real Web service descriptions drawn from operational Web services repositories. We believe that both service providers and consumers in a service-oriented computing environment can benefit from our Web service discovery approach.

An Improved Homonym Disambiguation Model based on Bayes Theory (Bayes 정리에 기반한 개선된 동형이의어 분별 모텔)

  • 김창환;이왕우
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.12
    • /
    • pp.1581-1590
    • /
    • 2001
  • This paper asserted more developmental model of WSD(word sense disambiguation) than J. Hur(2000)'s WSD model. This model suggested an improved statistical homonym disambiguation Model based on Bayes Theory. This paper using semantic information(co-occurrence data) obtained from definitions of part of speech(POS) tagged UMRD-S(Ulsan university Machine Readable Dictionary(Semantic Tagged)). we extracted semantic features in the context as nouns, predicates and adverbs from the definitions in the korean dictionary. In this research, we make an experiment with the accuracy of WSD system about major nine homonym nouns and new seven homonym predicates supplementary. The inner experimental result showed average accuracy of 98.32% with regard to the most Nine homonym nouns and 99.53% for the Seven homonym predicates. An Addition, we save test on Korean Information Base and ETRI's POS tagged corpus. This external experimental result showed average accuracy of 84.42% with regard to the most Nine nouns over unsupervised learning sentences from Korean Information Base and ETRI Corpus, 70.81 % accuracy rate for the Seven predicates from Sejong Project phrase part tagging corpus (3.5 million phrases) too.

  • PDF