• Title/Summary/Keyword: clustering problem

Search Result 709, Processing Time 0.037 seconds

Anomaly Detection Analysis using Repository based on Inverted Index (역방향 인덱스 기반의 저장소를 이용한 이상 탐지 분석)

  • Park, Jumi;Cho, Weduke;Kim, Kangseok
    • Journal of KIISE
    • /
    • v.45 no.3
    • /
    • pp.294-302
    • /
    • 2018
  • With the emergence of the new service industry due to the development of information and communication technology, cyber space risks such as personal information infringement and industrial confidentiality leakage have diversified, and the security problem has emerged as a critical issue. In this paper, we propose a behavior-based anomaly detection method that is suitable for real-time and large-volume data analysis technology. We show that the proposed detection method is superior to existing signature security countermeasures that are based on large-capacity user log data according to in-company personal information abuse and internal information leakage. As the proposed behavior-based anomaly detection method requires a technique for processing large amounts of data, a real-time search engine is used, called Elasticsearch, which is based on an inverted index. In addition, statistical based frequency analysis and preprocessing were performed for data analysis, and the DBSCAN algorithm, which is a density based clustering method, was applied to classify abnormal data with an example for easy analysis through visualization. Unlike the existing anomaly detection system, the proposed behavior-based anomaly detection technique is promising as it enables anomaly detection analysis without the need to set the threshold value separately, and was proposed from a statistical perspective.

Implementation of a Layer-7 Web Clustering System on Linux with Performance Enhancements via Recognition of User Request Rate Variations (리눅스에서 레이어-7 웹 클러스터링 시스템의 구현 및 사용자 요청률 차이의 인식에 기반한 성능 개선)

  • Hong Il-gu;Noh Sam H.
    • Journal of KIISE:Information Networking
    • /
    • v.32 no.1
    • /
    • pp.68-79
    • /
    • 2005
  • The popularity of Web service is ever increasing. As the number of services and clients continue to increase, the problem of providing a system that scales with this increase is becoming more difficult. A costly and ineffective method is to buy a new system that is more powerful every time the load becomes unbearable. h more cost effective solution is to expand the system as the need arises. This is the approach taken in Web cluster systems. However, providing effective scalability in a Web cluster system is stil1 an open issue. In this study, we implement a Web cluster system based on Layer 7 switching technique on Linux. The implementation is based on a design proposed and implemented by Aron et al., but on the FreeBSD. Though the design li the same, due to the vast difference between the FreeBSD and Linux, the implementation presented in this paper is totally new. We also propose the Dual Scheduling (DS) load distribution algorithm that distributes the requests to the system resources by observing the variations in the request rate. We show through measurement on our implementation that the DS alorithm performs considerably bettor than previous algorithms.

Generic Document Summarization using Coherence of Sentence Cluster and Semantic Feature (문장군집의 응집도와 의미특징을 이용한 포괄적 문서요약)

  • Park, Sun;Lee, Yeonwoo;Shim, Chun Sik;Lee, Seong Ro
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.12
    • /
    • pp.2607-2613
    • /
    • 2012
  • The results of inherent knowledge based generic summarization are influenced by the composition of sentence in document set. In order to resolve the problem, this papser propses a new generic document summarization which uses clustering of semantic feature of document and coherence of document cluster. The proposed method clusters sentences using semantic feature deriving from NMF(non-negative matrix factorization), which it can classify document topic group because inherent structure of document are well represented by the sentence cluster. In addition, the method can improve the quality of summarization because the importance sentences are extracted by using coherence of sentence cluster and the cluster refinement by re-cluster. The experimental results demonstrate appling the proposed method to generic summarization achieves better performance than generic document summarization methods.

Fast k-NN based Malware Analysis in a Massive Malware Environment

  • Hwang, Jun-ho;Kwak, Jin;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6145-6158
    • /
    • 2019
  • It is a challenge for the current security industry to respond to a large number of malicious codes distributed indiscriminately as well as intelligent APT attacks. As a result, studies using machine learning algorithms are being conducted as proactive prevention rather than post processing. The k-NN algorithm is widely used because it is intuitive and suitable for handling malicious code as unstructured data. In addition, in the malicious code analysis domain, the k-NN algorithm is easy to classify malicious codes based on previously analyzed malicious codes. For example, it is possible to classify malicious code families or analyze malicious code variants through similarity analysis with existing malicious codes. However, the main disadvantage of the k-NN algorithm is that the search time increases as the learning data increases. We propose a fast k-NN algorithm which improves the computation speed problem while taking the value of the k-NN algorithm. In the test environment, the k-NN algorithm was able to perform with only the comparison of the average of similarity of 19.71 times for 6.25 million malicious codes. Considering the way the algorithm works, Fast k-NN algorithm can also be used to search all data that can be vectorized as well as malware and SSDEEP. In the future, it is expected that if the k-NN approach is needed, and the central node can be effectively selected for clustering of large amount of data in various environments, it will be possible to design a sophisticated machine learning based system.

Clustering Algorithm for Efficient Energy Management in Sensor Network (센서 네트워크에서의 효율적 에너지 관리를 위한 클러스터링 알고리즘)

  • Seo, Sung-Yun;Jung, Won-Soo;Oh, Young-Hwan
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.10B
    • /
    • pp.845-854
    • /
    • 2008
  • In this paper, we propose a clustering algorithm for efficient energy management of sensor network consisted of sensor nodes that have restricted energy to solve these problem. Proposed algorithm improves energy efficiency by controlling sensing power. And it has distinctive feature that is applied in various network environment. The performance evaluation result shows that the energy efficiency is improved by 5% in the case of all sensor node fixed and by $10{\sim}15%$ in the case of all sensor node moving. It is confirmed through experiment process that the proposed algorithm brings energy efficiency ratio improvement of $5{\sim}15%$ more than the existing algorithm. Proposed algorithm derived an upper bound on the energy efficiency for Ubiquitous Computing environment that have various network environment that is with ZigBee technology of IEEE 802.15.4 bases. Also, we can blow bring elevation for lifetime of sensor network greatly for lifetime of sensor node as is small. And we think that may expand practical use extent of a sensor network technology more in fast changed network environment.

Design and Implementation of Spatial Characterization System using Density-Based Clustering (밀도 클러스터링을 이용한 공간 특성화 시스템 설계 및 구현)

  • You Jae-Hyun;Park Tae-Su;Ahn Chan-Min;Park Sang-Ho;Hong Jun-Sik;Lee Ju-Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.2 s.40
    • /
    • pp.43-52
    • /
    • 2006
  • LRecently, with increasing interest in ubiquitous computing, knowledge discovery method is needed with consideration of the efficiency and the effectiveness of wide range and various forms of data. Spatial Characterization which extends former characterization method with consideration of spatial and non-spatial property enables to find various form of knowledge in spatial region. The previous spatial characterization methods have the problems as follows. Firstly, former study shows the problem that the result of searched knowledge is unable to perform the multiple spatial analysis. Secondly, it is unable to secure the useful knowledge search since it searches the limited spatial region which is allocated by the user. Thus, this study suggests spatial characterization which applies to density based clustering.

  • PDF

Cluster Based Fuzzy Model Tree Using Node Information (상호 노드 정보를 이용한 클러스터 기반 퍼지 모델트리)

  • Park, Jin-Il;Lee, Dae-Jong;Kim, Yong-Sam;Cho, Young-Im;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.41-47
    • /
    • 2008
  • Cluster based fuzzy model tree has certain drawbacks to decrease performance of testinB data when over-fitting of training data exists. To reduce the sensitivity of performance due to over-fitting problem, we proposed a modified cluster based fuzzy model tree with node information. To construct model tree, cluster centers are calculated by fuzzy clustering method using all input and output attributes in advance. And then, linear models are constructed at internal nodes with fuzzy membership values between centers and input attributes. In the prediction step, membership values are calculated by using fuzzy distance between input attributes and all centers that passing the nodes from root to leaf nodes. Finally, data prediction is performed by the weighted average method with the linear models and fuzzy membership values. To show the effectiveness of the proposed method, we have applied our method to various dataset. Under various experiments, our proposed method shows better performance than conventional cluster based fuzzy model tree.

Design of Nonlinear Model Using Type-2 Fuzzy Logic System by Means of C-Means Clustering (C-Means 클러스터링 기반의 Type-2 퍼지 논리 시스템을 이용한 비선형 모델 설계)

  • Baek, Jin-Yeol;Lee, Young-Il;Oh, Sung-Kwun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.6
    • /
    • pp.842-848
    • /
    • 2008
  • This paper deal with uncertainty problem by using Type-2 fuzzy logic set for nonlinear system modeling. We design Type-2 fuzzy logic system in which the antecedent and the consequent part of rules are given as Type-2 fuzzy set and also analyze the performance of the ensuing nonlinear model with uncertainty. Here, the apexes of the antecedent membership functions of rules are decided by C-means clustering algorithm and the apexes of the consequent membership functions of rules are learned by using back-propagation based on gradient decent method. Also, the parameters related to the fuzzy model are optimized by means of particle swarm optimization. The proposed model is demonstrated with the aid of two representative numerical examples, such as mathematical synthetic data set and Mackey-Glass time series data set and also we discuss the approximation as well as generalization abilities for the model.

Feature Selection of Fuzzy Pattern Classifier by using Fuzzy Mapping (퍼지 매핑을 이용한 퍼지 패턴 분류기의 Feature Selection)

  • Roh, Seok-Beom;Kim, Yong Soo;Ahn, Tae-Chon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.6
    • /
    • pp.646-650
    • /
    • 2014
  • In this paper, in order to avoid the deterioration of the pattern classification performance which results from the curse of dimensionality, we propose a new feature selection method. The newly proposed feature selection method is based on Fuzzy C-Means clustering algorithm which analyzes the data points to divide them into several clusters and the concept of a function with fuzzy numbers. When it comes to the concept of a function where independent variables are fuzzy numbers and a dependent variable is a label of class, a fuzzy number should be related to the only one class label. Therefore, a good feature is a independent variable of a function with fuzzy numbers. Under this assumption, we calculate the goodness of each feature to pattern classification problem. Finally, in order to evaluate the classification ability of the proposed pattern classifier, the machine learning data sets are used.

Image recommendation algorithm based on profile using user preference and visual descriptor (사용자 선호도와 시각적 기술자를 이용한 사용자 프로파일 기반 이미지 추천 알고리즘)

  • Kim, Deok-Hwan;Yang, Jun-Sik;Cho, Won-Hee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.4
    • /
    • pp.463-474
    • /
    • 2008
  • The advancement of information technology and the popularization of Internet has explosively increased the amount of multimedia contents. Therefore, the requirement of multimedia recommendation to satisfy a user's needs increases fastly. Up to now, CF is used to recommend general items and multimedia contents. However, general CF doesn't reflect visual characteristics of image contents so that it can't be adaptable to image recommendation. Besides, it has limitations in new item recommendation, the sparsity problem, and dynamic change of user preference. In this paper, we present new image recommendation method FBCF (Feature Based Collaborative Filtering) to resolve such problems. FBCF builds new user profile by clustering visual features in terms of user preference, and reflects user's current preference to recommendation by using preference feedback. Experimental result using real mobile images demonstrate that FBCF outperforms conventional CF by 400% in terms of recommendation ratio.