• Title/Summary/Keyword: fuzzy similarity

Search Result 248, Processing Time 0.026 seconds

Optimization Driven MapReduce Framework for Indexing and Retrieval of Big Data

  • Abdalla, Hemn Barzan;Ahmed, Awder Mohammed;Al Sibahee, Mustafa A.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.1886-1908
    • /
    • 2020
  • With the technical advances, the amount of big data is increasing day-by-day such that the traditional software tools face a burden in handling them. Additionally, the presence of the imbalance data in big data is a massive concern to the research industry. In order to assure the effective management of big data and to deal with the imbalanced data, this paper proposes a new indexing algorithm for retrieving big data in the MapReduce framework. In mappers, the data clustering is done based on the Sparse Fuzzy-c-means (Sparse FCM) algorithm. The reducer combines the clusters generated by the mapper and again performs data clustering with the Sparse FCM algorithm. The two-level query matching is performed for determining the requested data. The first level query matching is performed for determining the cluster, and the second level query matching is done for accessing the requested data. The ranking of data is performed using the proposed Monarch chaotic whale optimization algorithm (M-CWOA), which is designed by combining Monarch butterfly optimization (MBO) [22] and chaotic whale optimization algorithm (CWOA) [21]. Here, the Parametric Enabled-Similarity Measure (PESM) is adapted for matching the similarities between two datasets. The proposed M-CWOA outperformed other methods with maximal precision of 0.9237, recall of 0.9371, F1-score of 0.9223, respectively.

Genetic Diversity of Wild Quail in China Ascertained with Microsatellite DNA Markers

  • Chang, G.B.;Chang, H.;Liu, X.P.;Zhao, W.M.;Ji, D.J.;Mao, Y.J.;Song, G.M.;Shi, X.K.
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.20 no.12
    • /
    • pp.1783-1790
    • /
    • 2007
  • The genetic diversity of domestic quail and two wild quail species, Japanese (Coturnix coturnix)and Common quail (Coturnix japonica), found in China was studied using microsatellite DNA markers. According to a comparison of the corresponding genetic indices in the three quail populations, such as Polymorphism Information Content (PIC), Mean Heterozygosity ($\bar{H}$) and Fixation Index, wild Common quail possessed rich genetic diversity with 4.67 alleles per site. Its values for PIC and $\bar{H}$ were the highest, 0.5732 and 0.6621, respectively. Domestic quail had the lowest values, 0.5467 and 0.5933, respectively. Wild Japanese quail had little difference in genetic diversity from domestic quail. In addition, from analyses of the fuzzy cluster based on standard genetic distance, the similarity relationship matrix coefficient between wild Japanese quail and domestic quail was 0.937, and that between wild Common quail and domestic quail was 0.783. All of these results showed that the wild Japanese quail were closer to the domestic quail for phylogenetic relationship than wild Common quail. These results at the molecular level provide useful data about quail's genetic background and further supported the hypothesis that the domestic quail originated from the wild Japanese quail.

Short Term Load Forecasting Algorithm for Lunar New Year's Day

  • Song, Kyung-Bin;Park, Jeong-Do;Park, Rae-Jun
    • Journal of Electrical Engineering and Technology
    • /
    • v.13 no.2
    • /
    • pp.591-598
    • /
    • 2018
  • Short term load forecasts complexly affected by socioeconomic factors and weather variables have non-linear characteristics. Thus far, researchers have improved load forecast technologies through diverse techniques such as artificial neural networks, fuzzy theories, and statistical methods in order to enhance the accuracy of load forecasts. Short term load forecast errors for special days are relatively much higher than that of weekdays. The errors are mainly caused by the irregularity of social activities and insufficient similar past data required for constructing load forecast models. In this study, the load characteristics of Lunar New Year's Day holidays well known for the highest error occurrence holiday period are analyzed to propose a load forecast technique for Lunar New Year's Day holidays. To solve the insufficient input data problem, the similarity of the load patterns of past Lunar New Year's Day holidays having similar patterns was judged by Euclid distance. Lunar New Year's Day holidays periods for 2011-2012 were forecasted by the proposed method which shows that the proposed algorithm yields better results than the comprehensive analysis method or the knowledge-based method.

A Study on the Malware Classification Method using API Similarity Analysis (API 유사도 분석을 통한 악성코드 분류 기법 연구)

  • Kang, Hong-Koo;Cho, Hyei-Sun;Kim, Byung-Ik;Lee, Tae-Jin;Park, Hae-Ryong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2013.11a
    • /
    • pp.808-810
    • /
    • 2013
  • 최근 인터넷 사용이 보편화됨과 더불어 정치적, 경제적인 목적으로 웹사이트와 이메일을 악용한 악성 코드가 급속히 유포되고 있다. 유포된 악성코드의 대부분은 기존 악성코드를 변형한 변종 악성코드이다. 이에 변종 악성코드를 탐지하기 위해 유사 악성코드를 분류하는 연구가 활발하다. 그러나 기존 연구에서는 정적 분석을 통해 얻어진 정보를 가지고 분류하기 때문에 실제 발생되는 행위에 대한 분석이 어려운 단점이 있다. 본 논문에서는 악성코드가 호출하는 API(Application Program Interface) 정보를 추출하고 유사도를 분석하여 악성코드를 분류하는 기법을 제안한다. 악성코드가 호출하는 API의 유사도를 분석하기 위해서 동적 API 후킹이 가능한 악성코드 API 분석 시스템을 개발하고 퍼지해시(Fuzzy Hash)인 ssdeep을 이용하여 비교 가능한 고유패턴을 생성하였다. 실제 변종 악성코드 샘플을 대상으로 한 실험을 수행하여 제안하는 악성코드 분류 기법의 유용성을 확인하였다.

A Metaheuristic Approach Towards Enhancement of Network Lifetime in Wireless Sensor Networks

  • J. Samuel Manoharan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.4
    • /
    • pp.1276-1295
    • /
    • 2023
  • Sensor networks are now an essential aspect of wireless communication, especially with the introduction of new gadgets and protocols. Their ability to be deployed anywhere, especially where human presence is undesirable, makes them perfect choices for remote observation and control. Despite their vast range of applications from home to hostile territory monitoring, limited battery power remains a limiting factor in their efficacy. To analyze and transmit data, it requires intelligent use of available battery power. Several studies have established effective routing algorithms based on clustering. However, choosing optimal cluster heads and similarity measures for clustering significantly increases computing time and cost. This work proposes and implements a simple two-phase technique of route creation and maintenance to ensure route reliability by employing nature-inspired ant colony optimization followed by the fuzzy decision engine (FDE). Benchmark methods such as PSO, ACO and GWO are compared with the proposed HRCM's performance. The objective has been focused towards establishing the superiority of proposed work amongst existing optimization methods in a standalone configuration. An average of 15% improvement in energy consumption followed by 12% improvement in latency reduction is observed in proposed hybrid model over standalone optimization methods.

An Approach to Drought Vulnerability Assessment using Multi Criteria Decision Making Method (다기준 의사결정기법을 적용한 가뭄취약성 평가 방법에 관한 연구)

  • Shin, Hyung Jin;Lee, Gyu Min;Lee, Jae Nam;Kwon, Min Sung;Kang, Mun Sung
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.385-385
    • /
    • 2020
  • 본 연구에서는 가뭄과 연관되는 다양한 관련 요인을 포함한 가뭄취약성 평가방안을 수립하고 이를 적용하는 것을 목표로 하였다. 평가기법은 평가인자와 가중치 선정, 평가자료 데이터베이스 구축, 평가자료와 가중치를 조합한 평가의 세 단계로 구성되었으며 평가인자 및 가중치 선정에는 Delphi 조사기법을 적용하고 평가기법으로는 최근 널리 적용되고 있는 MCDM (Multi-Criteria Decision Making) 방법인 TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) 기법을 활용하였다. 평가인자는 기상분야(Meteorological factors), 농업분야(Agricultural factors), 사회경제분야(Socioeconomic factors), 환경분야(Natural System)로 구성하였으며 선정된 인자에 대한 데이터베이스를 구성하기 위하여 기상청, 농어촌공사, 수자원공사 등의 관계기관이 관리하는 자료를 수집하였다. 수립한 가뭄취약성 평가방안을 2016년 3월부터 2019년 9월까지 우리나라 시군구 행정구역 단위, 총 167개 지역이며 순위법, 비율법, fuzzy 등 가중치 선정방법에 따라 결과에 약간의 차이가 나타난다. 가뭄예보결과와 취약성 평가결과를 비교해 보면 충청남도 홍성군이 동기간 동안 가뭄예경보 발령 횟수가 가장 많았으며, 충청남도 보령시와 서산시도 매우 높은 빈도로 확인되었다. 평가 결과, 충청북도, 경상남도, 전라남도에 가뭄 취약지역이 다수 도출 되어 이들 지역에 대한 가뭄 대응 방안 수립이 필요한 것으로 분석되었다.

  • PDF

Health Risk Management using Feature Extraction and Cluster Analysis considering Time Flow (시간흐름을 고려한 특징 추출과 군집 분석을 이용한 헬스 리스크 관리)

  • Kang, Ji-Soo;Chung, Kyungyong;Jung, Hoill
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.99-104
    • /
    • 2021
  • In this paper, we propose health risk management using feature extraction and cluster analysis considering time flow. The proposed method proceeds in three steps. The first is the pre-processing and feature extraction step. It collects user's lifelog using a wearable device, removes incomplete data, errors, noise, and contradictory data, and processes missing values. Then, for feature extraction, important variables are selected through principal component analysis, and data similar to the relationship between the data are classified through correlation coefficient and covariance. In order to analyze the features extracted from the lifelog, dynamic clustering is performed through the K-means algorithm in consideration of the passage of time. The new data is clustered through the similarity distance measurement method based on the increment of the sum of squared errors. Next is to extract information about the cluster by considering the passage of time. Therefore, using the health decision-making system through feature clusters, risks able to managed through factors such as physical characteristics, lifestyle habits, disease status, health care event occurrence risk, and predictability. The performance evaluation compares the proposed method using Precision, Recall, and F-measure with the fuzzy and kernel-based clustering. As a result of the evaluation, the proposed method is excellently evaluated. Therefore, through the proposed method, it is possible to accurately predict and appropriately manage the user's potential health risk by using the similarity with the patient.

Region-based Multi-level Thresholding for Color Image Segmentation (영역 기반의 Multi-level Thresholding에 의한 컬러 영상 분할)

  • Oh, Jun-Taek;Kim, Wook-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.43 no.6 s.312
    • /
    • pp.20-27
    • /
    • 2006
  • Multi-level thresholding is a method that is widely used in image segmentation. However most of the existing methods are not suited to be directly used in applicable fields and moreover expanded until a step of image segmentation. This paper proposes region-based multi-level thresholding as an image segmentation method. At first we classify pixels of each color channel to two clusters by using EWFCM(Entropy-based Weighted Fuzzy C-Means) algorithm that is an improved FCM algorithm with spatial information between pixels. To obtain better segmentation results, a reduction of clusters is then performed by a region-based reclassification step based on a similarity between regions existing in a cluster and the other clusters. The clusters are created using the classification information of pixels according to color channel. We finally perform a region merging by Bayesian algorithm based on Kullback-Leibler distance between a region and the neighboring regions as a post-processing method as many regions still exist in image. Experiments show that region-based multi-level thresholding is superior to cluster-, pixel-based multi-level thresholding, and the existing mettled. And much better segmentation results are obtained by the post-processing method.

Automatic Recommendation of (IP)TV programs based on A Rank Model using Collaborative Filtering (협업 필터링을 이용한 순위 정렬 모델 기반 (IP)TV 프로그램 자동 추천)

  • Kim, Eun-Hui;Pyo, Shin-Jee;Kim, Mun-Churl
    • Journal of Broadcast Engineering
    • /
    • v.14 no.2
    • /
    • pp.238-252
    • /
    • 2009
  • Due to the rapid increase of available contents via the convergence of broadcasting and internet, the efficient access to personally preferred contents has become an important issue. In this paper, for recommendation scheme for TV programs using a collaborative filtering technique is studied. For recommendation of user preferred TV programs, our proposed recommendation scheme consists of offline and online computation. About offline computation, we propose reasoning implicitly each user's preference in TV programs in terms of program contents, genres and channels, and propose clustering users based on each user's preferences in terms of genres and channels by dynamic fuzzy clustering method. After an active user logs in, to recommend TV programs to the user with high accuracy, the online computation includes pulling similar users to an active user by similarity measure based on the standard preference list of active user and filtering-out of the watched TV programs of the similar users, which do not exist in EPG and ranking of the remaining TV programs by proposed rank model. Especially, in this paper, the BM (Best Match) algorithm is extended to make the recommended TV programs be ranked by taking into account user's preferences. The experimental results show that the proposed scheme with the extended BM model yields 62.1% of prediction accuracy in top five recommendations for the TV watching history of 2,441 people.

A Passport Recognition and face Verification Using Enhanced fuzzy ART Based RBF Network and PCA Algorithm (개선된 퍼지 ART 기반 RBF 네트워크와 PCA 알고리즘을 이용한 여권 인식 및 얼굴 인증)

  • Kim Kwang-Baek
    • Journal of Intelligence and Information Systems
    • /
    • v.12 no.1
    • /
    • pp.17-31
    • /
    • 2006
  • In this paper, passport recognition and face verification methods which can automatically recognize passport codes and discriminate forgery passports to improve efficiency and systematic control of immigration management are proposed. Adjusting the slant is very important for recognition of characters and face verification since slanted passport images can bring various unwanted effects to the recognition of individual codes and faces. Therefore, after smearing the passport image, the longest extracted string of characters is selected. The angle adjustment can be conducted by using the slant of the straight and horizontal line that connects the center of thickness between left and right parts of the string. Extracting passport codes is done by Sobel operator, horizontal smearing, and 8-neighborhood contour tracking algorithm. The string of codes can be transformed into binary format by applying repeating binary method to the area of the extracted passport code strings. The string codes are restored by applying CDM mask to the binary string area and individual codes are extracted by 8-neighborhood contour tracking algerian. The proposed RBF network is applied to the middle layer of RBF network by using the fuzzy logic connection operator and proposing the enhanced fuzzy ART algorithm that dynamically controls the vigilance parameter. The face is authenticated by measuring the similarity between the feature vector of the facial image from the passport and feature vector of the facial image from the database that is constructed with PCA algorithm. After several tests using a forged passport and the passport with slanted images, the proposed method was proven to be effective in recognizing passport codes and verifying facial images.

  • PDF