• Title/Summary/Keyword: K-Means clustering algorithm

Search Result 548, Processing Time 0.021 seconds

Contents-based Image Retrieval Using Color & Edge Information (칼라와 에지 정보를 이용한 내용기반 영상 검색)

  • Park, Dong-Won;An, Syungog;Ma, Ming;Singh, Kulwinder
    • The Journal of Korean Association of Computer Education
    • /
    • v.8 no.1
    • /
    • pp.81-91
    • /
    • 2005
  • In this paper we present a novel approach for image retrieval using color and edge information. We take into account the HSI(Hue, Saturation and Intensity) color space instead of RGB space, which emphasizes more on visual perception. In our system colors in an image are clustered into a small number of representative colors. The color feature descriptor consists of the representative colors and their percentages in the image. An improved cumulative color histogram distance measure is defined for this descriptor. And also, we have developed an efficient edge detection technique as an optional feature to our retrieval system in order to surmount the weakness of color feature. During the query processing, both the features (color, edge information) could be integrated for image retrieval as well as a standalone entity, by specifying it in a certain proportion. The content-based retrieval system is tested to be effective in terms of retrieval and scalability through experimental results and precision-recall analysis.

  • PDF

Design of Fuzzy Prediction System based on Dual Tuning using Enhanced Genetic Algorithms (강화된 유전알고리즘을 이용한 이중 동조 기반 퍼지 예측시스템 설계 및 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.59 no.1
    • /
    • pp.184-191
    • /
    • 2010
  • Many researchers have been considering genetic algorithms to system optimization problems. Especially, real-coded genetic algorithms are very effective techniques because they are simpler in coding procedures than binary-coded genetic algorithms and can reduce extra works that increase the length of chromosome for wide search space. Thus, this paper presents a fuzzy system design technique to improve the performance of the fuzzy system. The proposed system consists of two procedures. The primary tuning procedure coarsely tunes fuzzy sets of the system using the k-means clustering algorithm of which the structure is very simple, and then the secondary tuning procedure finely tunes the fuzzy sets using enhanced real-coded genetic algorithms based on the primary procedure. In addition, this paper constructs multiple fuzzy systems using a data preprocessing procedure which is contrived for reflecting various characteristics of nonlinear data. Finally, the proposed fuzzy system is applied to the field of time series prediction and the effectiveness of the proposed techniques are verified by simulations of typical time series examples.

Design of Multiple Model Fuzzy Predictors using Data Preprocessing and its Application (데이터 전처리를 이용한 다중 모델 퍼지 예측기의 설계 및 응용)

  • Bang, Young-Keun;Lee, Chul-Heui
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.1
    • /
    • pp.173-180
    • /
    • 2009
  • It is difficult to predict non-stationary or chaotic time series which includes the drift and/or the non-linearity as well as uncertainty. To solve it, we propose an effective prediction method which adopts data preprocessing and multiple model TS fuzzy predictors combined with model selection mechanism. In data preprocessing procedure, the candidates of the optimal difference interval are determined based on the correlation analysis, and corresponding difference data sets are generated in order to use them as predictor input instead of the original ones because the difference data can stabilize the statistical characteristics of those time series and better reveals their implicit properties. Then, TS fuzzy predictors are constructed for multiple model bank, where k-means clustering algorithm is used for fuzzy partition of input space, and the least squares method is applied to parameter identification of fuzzy rules. Among the predictors in the model bank, the one which best minimizes the performance index is selected, and it is used for prediction thereafter. Finally, the error compensation procedure based on correlation analysis is added to improve the prediction accuracy. Some computer simulations are performed to verify the effectiveness of the proposed method.

Prediction System Design based on An Interval Type-2 Fuzzy Logic System using HCBKA (HCBKA를 이용한 Interval Type-2 퍼지 논리시스템 기반 예측 시스템 설계)

  • Bang, Young-Keun;Lee, Chul-Heui
    • Journal of Industrial Technology
    • /
    • v.30 no.A
    • /
    • pp.111-117
    • /
    • 2010
  • To improve the performance of the prediction system, the system should reflect well the uncertainty of nonlinear data. Thus, this paper presents multiple prediction systems based on Type-2 fuzzy sets. To construct each prediction system, an Interval Type-2 TSK Fuzzy Logic System and difference data were used, because, in general, it has been known that the Type-2 Fuzzy Logic System can deal with the uncertainty of nonlinear data better than the Type-1 Fuzzy Logic System, and the difference data can provide more steady information than that of original data. Also, to improve each rule base of the fuzzy prediction systems, the HCBKA (Hierarchical Correlation Based K-means clustering Algorithm) was applied because it can consider correlationship and statistical characteristics between data at a time. Subsequently, to alleviate complexity of the proposed prediction system, a system selection method was used. Finally, this paper analyzed and compared the performances between the Type-1 prediction system and the Interval Type-2 prediction system using simulations of three typical time series examples.

  • PDF

EDGE: An Enticing Deceptive-content GEnerator as Defensive Deception

  • Li, Huanruo;Guo, Yunfei;Huo, Shumin;Ding, Yuehang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1891-1908
    • /
    • 2021
  • Cyber deception defense mitigates Advanced Persistent Threats (APTs) with deploying deceptive entities, such as the Honeyfile. The Honeyfile distracts attackers from valuable digital documents and attracts unauthorized access by deliberately exposing fake content. The effectiveness of distraction and trap lies in the enticement of fake content. However, existing studies on the Honeyfile focus less on this perspective. In this work, we seek to improve the enticement of fake text content through enhancing its readability, indistinguishability, and believability. Hence, an enticing deceptive-content generator, EDGE, is presented. The EDGE is constructed with three steps: extracting key concepts with a semantics-aware K-means clustering algorithm, searching for candidate deceptive concepts within the Word2Vec model, and generating deceptive text content under the Integrated Readability Index (IR). Furthermore, the readability and believability performance analyses are undertaken. The experimental results show that EDGE generates indistinguishable deceptive text content without decreasing readability. In all, EDGE proves effective to generate enticing deceptive text content as deception defense against APTs.

Method for Estimating Intramuscular Fat Percentage of Hanwoo(Korean Traditional Cattle) Using Convolutional Neural Networks in Ultrasound Images

  • Kim, Sang Hyun
    • International journal of advanced smart convergence
    • /
    • v.10 no.1
    • /
    • pp.105-116
    • /
    • 2021
  • In order to preserve the seeds of excellent Hanwoo(Korean traditional cattle) and secure quality competitiveness in the infinite competition with foreign imported beef, production of high-quality Hanwoo beef is absolutely necessary. %IMF (Intramuscular Fat Percentage) is one of the most important factors in evaluating the value of high-quality meat, although standards vary according to food culture and industrial conditions by country. Therefore, it is required to develop a %IMF estimation algorithm suitable for Hanwoo. In this study, we proposed a method of estimating %IMF of Hanwoo using CNN in ultrasound images. First, the proposed method classified the chemically measured %IMF into 10 classes using k-means clustering method to apply CNN. Next, ROI images were obtained at regular intervals from each ultrasound image and used for CNN training and estimation. The proposed CNN model is composed of three stages of convolution layer and fully connected layer. As a result of the experiment, it was confirmed that the %IMF of Hanwoo was estimated with an accuracy of 98.2%. The correlation coefficient between the estimated %IMF and the real %IMF by the proposed method is 0.97, which is about 10% better than the 0.88 of the previous method.

A Study on Heavy Rainfall Guidance Realized with the Aid of Neuro-Fuzzy and SVR Algorithm Using AWS Data (AWS자료 기반 SVR과 뉴로-퍼지 알고리즘 구현 호우주의보 가이던스 연구)

  • Kim, Hyun-Myung;Oh, Sung-Kwun;Kim, Yong-Hyuk;Lee, Yong-Hee
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.4
    • /
    • pp.526-533
    • /
    • 2014
  • In this study, we introduce design methodology to develop a guidance for issuing heavy rainfall warning by using both RBFNNs(Radial basis function neural networks) and SVR(Support vector regression) model, and then carry out the comparative studies between two pattern classifiers. Individual classifiers are designed as architecture realized with the aid of optimization and pre-processing algorithm. Because the predictive performance of the existing heavy rainfall forecast system is commonly affected from diverse processing techniques of meteorological data, under-sampling method as the pre-processing method of input data is used, and also data discretization and feature extraction method for SVR and FCM clustering and PSO method for RBFNNs are exploited respectively. The observed data, AWS(Automatic weather wtation), supplied from KMA(korea meteorological administration), is used for training and testing of the proposed classifiers. The proposed classifiers offer the related information to issue a heavy rain warning in advance before 1 to 3 hours by using the selected meteorological data and the cumulated precipitation amount accumulated for 1 to 12 hours from AWS data. For performance evaluation of each classifier, ETS(Equitable Threat Score) method is used as standard verification method for predictive ability. Through the comparative studies of two classifiers, neuro-fuzzy method is effectively used for improved performance and to show stable predictive result of guidance to issue heavy rainfall warning.

A Study on the Cerber-Type Ransomware Detection Model Using Opcode and API Frequency and Correlation Coefficient (Opcode와 API의 빈도수와 상관계수를 활용한 Cerber형 랜섬웨어 탐지모델에 관한 연구)

  • Lee, Gye-Hyeok;Hwang, Min-Chae;Hyun, Dong-Yeop;Ku, Young-In;Yoo, Dong-Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.363-372
    • /
    • 2022
  • Since the recent COVID-19 Pandemic, the ransomware fandom has intensified along with the expansion of remote work. Currently, anti-virus vaccine companies are trying to respond to ransomware, but traditional file signature-based static analysis can be neutralized in the face of diversification, obfuscation, variants, or the emergence of new ransomware. Various studies are being conducted for such ransomware detection, and detection studies using signature-based static analysis and behavior-based dynamic analysis can be seen as the main research type at present. In this paper, the frequency of ".text Section" Opcode and the Native API used in practice was extracted, and the association between feature information selected using K-means Clustering algorithm, Cosine Similarity, and Pearson correlation coefficient was analyzed. In addition, Through experiments to classify and detect worms among other malware types and Cerber-type ransomware, it was verified that the selected feature information was specialized in detecting specific ransomware (Cerber). As a result of combining the finally selected feature information through the above verification and applying it to machine learning and performing hyper parameter optimization, the detection rate was up to 93.3%.

Scalable Cluster Overlay Source Routing Protocol (확장성을 갖는 클러스터 기반의 라우팅 프로토콜)

  • Jang, Kwang-Soo;Yang, Hyo-Sik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.3
    • /
    • pp.83-89
    • /
    • 2010
  • Scalable routing is one of the key challenges in designing and operating large scale MANETs. Performance of routing protocols proposed so far is only guaranteed under various limitation, i.e., dependent of the number of nodes in the network or needs the location information of destination node. Due to the dependency to the number of nodes in the network, as the number of nodes increases the performance of previous routing protocols degrade dramatically. We propose Cluster Overlay Dynamic Source Routing (CODSR) protocol. We conduct performance analysis by means of computer simulation under various conditions - diameter scaling and density scaling. Developed algorithm outperforms the DSR algorithm, e.g., more than 90% improvement as for the normalized routing load. Operation of CODSR is very simple and we show that the message and time complexity of CODSR is independent of the number of nodes in the network which makes CODSR highly scalable.

Analysis on the Efficiency Change in Electric Vehicle Charging Stations Using Multi-Period Data Envelopment Analysis (다기간 자료포락분석을 이용한 전기차 충전소 효율성 변화 분석)

  • Son, Dong-Hoon;Gang, Yeong-Su;Kim, Hwa-Joong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.2
    • /
    • pp.1-14
    • /
    • 2021
  • It is highly challenging to measure the efficiency of electric vehicle charging stations (EVCSs) because factors affecting operational characteristics of EVCSs are time-varying in practice. For the efficiency measurement, environmental factors around the EVCSs can be considered because such factors affect charging behaviors of electric vehicle drivers, resulting in variations of accessibility and attractiveness for the EVCSs. Considering dynamics of the factors, this paper examines the technical efficiency of 622 electric vehicle charging stations in Seoul using data envelopment analysis (DEA). The DEA is formulated as a multi-period output-oriented constant return to scale model. Five inputs including floating population, number of nearby EVCSs, average distance of nearby EVCSs, traffic volume and traffic congestion are considered and the charging frequency of EVCSs is used as the output. The result of efficiency measurement shows that not many EVCSs has most of charging demand at certain periods of time, while the others are facing with anemic charging demand. Tobit regression analyses show that the traffic congestion negatively affects the efficiency of EVCSs, while the traffic volume and the number of nearby EVCSs are positive factors improving the efficiency around EVCSs. We draw some notable characteristics of efficient EVCSs by comparing means of the inputs related to the groups classified by K-means clustering algorithm. This analysis presents that efficient EVCSs can be generally characterized with the high number of nearby EVCSs and low level of the traffic congestion.