• Title/Summary/Keyword: Topic Detection

Search Result 180, Processing Time 0.025 seconds

Malicious Codes Re-grouping Methods using Fuzzy Clustering based on Native API Frequency (Native API 빈도 기반의 퍼지 군집화를 이용한 악성코드 재그룹화 기법연구)

  • Kwon, O-Chul;Bae, Seong-Jae;Cho, Jae-Ik;Moon, Jung-Sub
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.18 no.6A
    • /
    • pp.115-127
    • /
    • 2008
  • The Native API is a system call which can only be accessed with the authentication of the administrator. It can be used to detect a variety of malicious codes which can only be executed with the administrator's authority. Therefore, much research is being done on detection methods using the characteristics of the Native API. Most of these researches are being done by using supervised learning methods of machine learning. However, the classification standards of Anti-Virus companies do not reflect the characteristics of the Native API. As a result the population data used in the supervised learning methods are not accurate. Therefore, more research is needed on the topic of classification standards using the Native API for detection. This paper proposes a method for re-grouping malicious codes using fuzzy clustering methods with the Native API standard. The accuracy of the proposed re-grouping method uses machine learning to compare detection rates with previous classifying methods for evaluation.

Data anomaly detection for structural health monitoring of bridges using shapelet transform

  • Arul, Monica;Kareem, Ahsan
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.93-103
    • /
    • 2022
  • With the wider availability of sensor technology through easily affordable sensor devices, several Structural Health Monitoring (SHM) systems are deployed to monitor vital civil infrastructure. The continuous monitoring provides valuable information about the health of the structure that can help provide a decision support system for retrofits and other structural modifications. However, when the sensors are exposed to harsh environmental conditions, the data measured by the SHM systems tend to be affected by multiple anomalies caused by faulty or broken sensors. Given a deluge of high-dimensional data collected continuously over time, research into using machine learning methods to detect anomalies are a topic of great interest to the SHM community. This paper contributes to this effort by proposing a relatively new time series representation named "Shapelet Transform" in combination with a Random Forest classifier to autonomously identify anomalies in SHM data. The shapelet transform is a unique time series representation based solely on the shape of the time series data. Considering the individual characteristics unique to every anomaly, the application of this transform yields a new shape-based feature representation that can be combined with any standard machine learning algorithm to detect anomalous data with no manual intervention. For the present study, the anomaly detection framework consists of three steps: identifying unique shapes from anomalous data, using these shapes to transform the SHM data into a local-shape space and training machine learning algorithms on this transformed data to identify anomalies. The efficacy of this method is demonstrated by the identification of anomalies in acceleration data from an SHM system installed on a long-span bridge in China. The results show that multiple data anomalies in SHM data can be automatically detected with high accuracy using the proposed method.

GCNXSS: An Attack Detection Approach for Cross-Site Scripting Based on Graph Convolutional Networks

  • Pan, Hongyu;Fang, Yong;Huang, Cheng;Guo, Wenbo;Wan, Xuelin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.4008-4023
    • /
    • 2022
  • Since machine learning was introduced into cross-site scripting (XSS) attack detection, many researchers have conducted related studies and achieved significant results, such as saving time and labor costs by not maintaining a rule database, which is required by traditional XSS attack detection methods. However, this topic came across some problems, such as poor generalization ability, significant false negative rate (FNR) and false positive rate (FPR). Moreover, the automatic clustering property of graph convolutional networks (GCN) has attracted the attention of researchers. In the field of natural language process (NLP), the results of graph embedding based on GCN are automatically clustered in space without any training, which means that text data can be classified just by the embedding process based on GCN. Previously, other methods required training with the help of labeled data after embedding to complete data classification. With the help of the GCN auto-clustering feature and labeled data, this research proposes an approach to detect XSS attacks (called GCNXSS) to mine the dependencies between the units that constitute an XSS payload. First, GCNXSS transforms a URL into a word homogeneous graph based on word co-occurrence relationships. Then, GCNXSS inputs the graph into the GCN model for graph embedding and gets the classification results. Experimental results show that GCNXSS achieved successful results with accuracy, precision, recall, F1-score, FNR, FPR, and predicted time scores of 99.97%, 99.75%, 99.97%, 99.86%, 0.03%, 0.03%, and 0.0461ms. Compared with existing methods, GCNXSS has a lower FNR and FPR with stronger generalization ability.

Detection of Depression Trends in Literary Cyber Writers Using Sentiment Analysis and Machine Learning

  • Faiza Nasir;Haseeb Ahmad;CM Nadeem Faisal;Qaisar Abbas;Mubarak Albathan;Ayyaz Hussain
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.67-80
    • /
    • 2023
  • Rice is an important food crop for most of the population in Nowadays, psychologists consider social media an important tool to examine mental disorders. Among these disorders, depression is one of the most common yet least cured disease Since abundant of writers having extensive followers express their feelings on social media and depression is significantly increasing, thus, exploring the literary text shared on social media may provide multidimensional features of depressive behaviors: (1) Background: Several studies observed that depressive data contains certain language styles and self-expressing pronouns, but current study provides the evidence that posts appearing with self-expressing pronouns and depressive language styles contain high emotional temperatures. Therefore, the main objective of this study is to examine the literary cyber writers' posts for discovering the symptomatic signs of depression. For this purpose, our research emphases on extracting the data from writers' public social media pages, blogs, and communities; (3) Results: To examine the emotional temperatures and sentences usage between depressive and not depressive groups, we employed the SentiStrength algorithm as a psycholinguistic method, TF-IDF and N-Gram for ranked phrases extraction, and Latent Dirichlet Allocation for topic modelling of the extracted phrases. The results unearth the strong connection between depression and negative emotional temperatures in writer's posts. Moreover, we used Naïve Bayes, Support Vector Machines, Random Forest, and Decision Tree algorithms to validate the classification of depressive and not depressive in terms of sentences, phrases and topics. The results reveal that comparing with others, Support Vectors Machines algorithm validates the classification while attaining highest 79% f-score; (4) Conclusions: Experimental results show that the proposed system outperformed for detection of depression trends in literary cyber writers using sentiment analysis.

Visual Analytics using Topic Composition for Predicting Event Flow (토픽의 조합으로 이벤트 흐름을 예측하기 위한 시각적 분석 시스템)

  • Yeon, Hanbyul;Kim, Seokyeon;Jang, Yun
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.12
    • /
    • pp.768-773
    • /
    • 2015
  • Emergence events are the cause of much economic damage. In order to minimize the damage that these events cause, it must be possible to predict what will happen in the future. Accordingly, many researchers have focused on real-time monitoring, detecting events, and investigating events. In addition, there have also been many studies on predictive analysis for forecasting of future trends. However, most studies provide future tendency per event without contextual compositive analysis. In this paper, we present a predictive visual analytics system using topic composition to provide future trends per event. We first extract abnormal topics from social media data to find interesting and unexpected events. We then search for similar emergence patterns in the past. Relevant topics in the past are provided by news media data. Finally, the user combines the relevant topics and a new context is created for contextual prediction. In a case study, we demonstrate our visual analytics system with two different cases and validate our system with possible predictive story lines.

Research on Design of DDS-based Conventional Railway Signal Data Specification for Real-time Railway Safety Monitoring and Control (실시간 철도 안전관제를 위한 DDS 기반의 일반철도 신호 데이터 규격 설계 연구)

  • Park, Yunjung;Lim, Damsub;Min, Dugki;Kim, Sang Ahm
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.4
    • /
    • pp.739-746
    • /
    • 2016
  • The real-time railway safety monitoring and control system is for prevention of safety accidents, and this system adopts DDS (Data Distribution Service) standard based data transmission method to support integrated management of data from existing on-site safety detection devices. In this paper, we introduce the design of DDS-based data specification from on-site signal equipment on the conventional railway. For this, we (1) design UML data model of KRS SG 0062 standard which defines existing data specification, (2) define DDS Topics for DDS transmission and map KRS model to DDS Topic model, (3) suggest data transformation rules and (4) design network control QoS polices. In addition, we analysis actual on-site log data and validate our data specification design. DDS-based data transmission enables data compatibility among on-site devices and the real-time railway safety monitoring and control system, and allows efficient network management for a large amount of data transfer.

Analysis of Korea's Artificial Intelligence Competitiveness Based on Patent Data: Focusing on Patent Index and Topic Modeling (특허데이터 기반 한국의 인공지능 경쟁력 분석 : 특허지표 및 토픽모델링을 중심으로)

  • Lee, Hyun-Sang;Qiao, Xin;Shin, Sun-Young;Kim, Gyu-Ri;Oh, Se-Hwan
    • Informatization Policy
    • /
    • v.29 no.4
    • /
    • pp.43-66
    • /
    • 2022
  • With the development of artificial intelligence technology, competition for artificial intelligence technology patents around the world is intensifying. During the period 2000 ~ 2021, artificial intelligence technology patent applications at the US Patent and Trademark Office have been steadily increasing, and the growth rate has been steeper since the 2010s. As a result of analyzing Korea's artificial intelligence technology competitiveness through patent indices, it is evaluated that patent activity, impact, and marketability are superior in areas such as auditory intelligence and visual intelligence. However, compared to other countries, overall Korea's artificial intelligence technology patents are good in terms of activity and marketability, but somewhat inferior in technological impact. While noise canceling and voice recognition have recently decreased as topics for artificial intelligence, growth is expected in areas such as model learning optimization, smart sensors, and autonomous driving. In the case of Korea, efforts are required as there is a slight lack of patent applications in areas such as fraud detection/security and medical vision learning.

A review of ground camera-based computer vision techniques for flood management

  • Sanghoon Jun;Hyewoon Jang;Seungjun Kim;Jong-Sub Lee;Donghwi Jung
    • Computers and Concrete
    • /
    • v.33 no.4
    • /
    • pp.425-443
    • /
    • 2024
  • Floods are among the most common natural hazards in urban areas. To mitigate the problems caused by flooding, unstructured data such as images and videos collected from closed circuit televisions (CCTVs) or unmanned aerial vehicles (UAVs) have been examined for flood management (FM). Many computer vision (CV) techniques have been widely adopted to analyze imagery data. Although some papers have reviewed recent CV approaches that utilize UAV images or remote sensing data, less effort has been devoted to studies that have focused on CCTV data. In addition, few studies have distinguished between the main research objectives of CV techniques (e.g., flood depth and flooded area) for a comprehensive understanding of the current status and trends of CV applications for each FM research topic. Thus, this paper provides a comprehensive review of the literature that proposes CV techniques for aspects of FM using ground camera (e.g., CCTV) data. Research topics are classified into four categories: flood depth, flood detection, flooded area, and surface water velocity. These application areas are subdivided into three types: urban, river and stream, and experimental. The adopted CV techniques are summarized for each research topic and application area. The primary goal of this review is to provide guidance for researchers who plan to design a CV model for specific purposes such as flood-depth estimation. Researchers should be able to draw on this review to construct an appropriate CV model for any FM purpose.

A Study on the Fraud Detection in an Online Second-hand Market by Using Topic Modeling and Machine Learning (토픽 모델링과 머신 러닝 방법을 이용한 온라인 C2C 중고거래 시장에서의 사기 탐지 연구)

  • Dongwoo Lee;Jinyoung Min
    • Information Systems Review
    • /
    • v.23 no.4
    • /
    • pp.45-67
    • /
    • 2021
  • As the transaction volume of the C2C second-hand market is growing, the number of frauds, which intend to earn unfair gains by sending products different from specified ones or not sending them to buyers, is also increasing. This study explores the model that can identify frauds in the online C2C second-hand market by examining the postings for transactions. For this goal, this study collected 145,536 field data from actual C2C second-hand market. Then, the model is built with the characteristics from postings such as the topic and the linguistic characteristics of the product description, and the characteristics of products, postings, sellers, and transactions. The constructed model is then trained by the machine learning algorithm XGBoost. The final analysis results show that fraudulent postings have less information, which is also less specific, fewer nouns and images, a higher ratio of the number and white space, and a shorter length than genuine postings do. Also, while the genuine postings are focused on the product information for nouns, delivery information for verbs, and actions for adjectives, the fraudulent postings did not show those characteristics. This study shows that the various features can be extracted from postings written in C2C second-hand transactions and be used to construct an effective model for frauds. The proposed model can be also considered and applied for the other C2C platforms. Overall, the model proposed in this study can be expected to have positive effects on suppressing and preventing fraudulent behavior in online C2C markets.

Development of an Algorithm for the Embryo Location of Seed by using Machine Vision (기계시각을 이용한 대립종자의 씨눈위치 판정알고리즘 개발)

  • 김동억;손재룡;장유섭;장익주
    • Journal of Bio-Environment Control
    • /
    • v.13 no.2
    • /
    • pp.90-95
    • /
    • 2004
  • This study was conducted to develop an algorithm for the embryo location in seed by using machine vision. The topic of this research is to detect the embryo location in seed regardless of seed supply direction. In order to detect the embryo location in Cham Bak, Tuktojwa and Hukjong, the effect of seed posture in the supply line was investigated. When the seed posture angle of Chambak from horizontal direction was $30^{\circ}$, the detection accuracy for embryo location was 77.8%, while detection accuracy was 100% for the $0^{\circ}$ or $15^{\circ}$. When seed posture angle of Tuktojwa was $30^{\circ}$ from the horizontal direction, the detection accuracy was 89.5% and it was 100% for the $0^{\circ}$ and $15^{\circ}$. Embryo location detection accuracy for the Hukjong was 94.4% when the seed posture angle from the horizontal direction is $30^{\circ}$, and it was 100% for the $0^{\circ}$ and $15^{\circ}$. When seeds are fed into the posturing and seeding line, the seed postures within $30^{\circ}$ with mechanical means, and at most $15^{\circ}$ seed stand posture. the developed algorithm can detect the embryo position in the seed. So, this embryo detection system is very useful tool in the posturing and seeding line.