• Title/Summary/Keyword: Clustering coefficient

Search Result 197, Processing Time 0.025 seconds

Understanding the Performance of Collaborative Filtering Recommendation through Social Network Analysis (소셜네트워크 분석을 통한 협업필터링 추천 성과의 이해)

  • Ahn, Sung-Mahn;Kim, In-Hwan;Choi, Byoung-Gu;Cho, Yoon-Ho;Kim, Eun-Hong;Kim, Myeong-Kyun
    • The Journal of Society for e-Business Studies
    • /
    • v.17 no.2
    • /
    • pp.129-147
    • /
    • 2012
  • Collaborative filtering (CF), one of the most successful recommendation techniques, has been used in a number of different applications such as recommending web pages, movies, music, articles and products. One of the critical issues in CF is why recommendation performances are different depending on application domains. However, prior literatures have focused on only data characteristics to explain the origin of the difference. Scant attentions have been paid to provide systematic explanation on the issue. To fill this research gap, this study attempts to systematically explain why recommendation performances are different using structural indexes of social network. For this purpose, we developed hypotheses regarding the relationships between structural indexes of social network and recommendation performance of collaboration filtering, and empirically tested them. Results of this study showed that density and inconclusiveness positively affected recommendation performance while clustering coefficient negatively affected it. This study can be used as stepping stone for understanding collaborative filtering recommendation performance. Furthermore, it might be helpful for managers to decide whether they adopt recommendation systems.

A Study on the Cerber-Type Ransomware Detection Model Using Opcode and API Frequency and Correlation Coefficient (Opcode와 API의 빈도수와 상관계수를 활용한 Cerber형 랜섬웨어 탐지모델에 관한 연구)

  • Lee, Gye-Hyeok;Hwang, Min-Chae;Hyun, Dong-Yeop;Ku, Young-In;Yoo, Dong-Young
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.10
    • /
    • pp.363-372
    • /
    • 2022
  • Since the recent COVID-19 Pandemic, the ransomware fandom has intensified along with the expansion of remote work. Currently, anti-virus vaccine companies are trying to respond to ransomware, but traditional file signature-based static analysis can be neutralized in the face of diversification, obfuscation, variants, or the emergence of new ransomware. Various studies are being conducted for such ransomware detection, and detection studies using signature-based static analysis and behavior-based dynamic analysis can be seen as the main research type at present. In this paper, the frequency of ".text Section" Opcode and the Native API used in practice was extracted, and the association between feature information selected using K-means Clustering algorithm, Cosine Similarity, and Pearson correlation coefficient was analyzed. In addition, Through experiments to classify and detect worms among other malware types and Cerber-type ransomware, it was verified that the selected feature information was specialized in detecting specific ransomware (Cerber). As a result of combining the finally selected feature information through the above verification and applying it to machine learning and performing hyper parameter optimization, the detection rate was up to 93.3%.

Performance Improvement of Collaborative Filtering System Using Associative User′s Clustering Analysis for the Recalculation of Preference and Representative Attribute-Neighborhood (선호도 재계산을 위한 연관 사용자 군집 분석과 Representative Attribute -Neighborhood를 이용한 협력적 필터링 시스템의 성능향상)

  • Jung, Kyung-Yong;Kim, Jin-Su;Kim, Tae-Yong;Lee, Jung-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.287-296
    • /
    • 2003
  • There has been much research focused on collaborative filtering technique in Recommender System. However, these studies have shown the First-Rater Problem and the Sparsity Problem. The main purpose of this Paper is to solve these Problems. In this Paper, we suggest the user's predicting preference method using Bayesian estimated value and the associative user clustering for the recalculation of preference. In addition to this method, to complement a shortcoming, which doesn't regard the attribution of item, we use Representative Attribute-Neighborhood method that is used for the prediction when we find the similar neighborhood through extracting the representative attribution, which most affect the preference. We improved the efficiency by using the associative user's clustering analysis in order to calculate the preference of specific item within the cluster item vector to the collaborative filtering algorithm. Besides, for the problem of the Sparsity and First-Rater, through using Association Rule Hypergraph Partitioning algorithm associative users are clustered according to the genre. New users are classified into one of these genres by Naive Bayes classifier. In addition, in order to get the similarity value between users belonged to the classified genre and new users, and this paper allows the different estimated value to item which user evaluated through Naive Bayes learning. As applying the preference granted the estimated value to Pearson correlation coefficient, it can make the higher accuracy because the errors that cause the missing value come less. We evaluate our method on a large collaborative filtering database of user rating and it significantly outperforms previous proposed method.

Health Risk Management using Feature Extraction and Cluster Analysis considering Time Flow (시간흐름을 고려한 특징 추출과 군집 분석을 이용한 헬스 리스크 관리)

  • Kang, Ji-Soo;Chung, Kyungyong;Jung, Hoill
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.99-104
    • /
    • 2021
  • In this paper, we propose health risk management using feature extraction and cluster analysis considering time flow. The proposed method proceeds in three steps. The first is the pre-processing and feature extraction step. It collects user's lifelog using a wearable device, removes incomplete data, errors, noise, and contradictory data, and processes missing values. Then, for feature extraction, important variables are selected through principal component analysis, and data similar to the relationship between the data are classified through correlation coefficient and covariance. In order to analyze the features extracted from the lifelog, dynamic clustering is performed through the K-means algorithm in consideration of the passage of time. The new data is clustered through the similarity distance measurement method based on the increment of the sum of squared errors. Next is to extract information about the cluster by considering the passage of time. Therefore, using the health decision-making system through feature clusters, risks able to managed through factors such as physical characteristics, lifestyle habits, disease status, health care event occurrence risk, and predictability. The performance evaluation compares the proposed method using Precision, Recall, and F-measure with the fuzzy and kernel-based clustering. As a result of the evaluation, the proposed method is excellently evaluated. Therefore, through the proposed method, it is possible to accurately predict and appropriately manage the user's potential health risk by using the similarity with the patient.

Design of Face Recognition algorithm Using PCA&LDA combined for Data Pre-Processing and Polynomial-based RBF Neural Networks (PCA와 LDA를 결합한 데이터 전 처리와 다항식 기반 RBFNNs을 이용한 얼굴 인식 알고리즘 설계)

  • Oh, Sung-Kwun;Yoo, Sung-Hoon
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.5
    • /
    • pp.744-752
    • /
    • 2012
  • In this study, the Polynomial-based Radial Basis Function Neural Networks is proposed as an one of the recognition part of overall face recognition system that consists of two parts such as the preprocessing part and recognition part. The design methodology and procedure of the proposed pRBFNNs are presented to obtain the solution to high-dimensional pattern recognition problems. In data preprocessing part, Principal Component Analysis(PCA) which is generally used in face recognition, which is useful to express some classes using reduction, since it is effective to maintain the rate of recognition and to reduce the amount of data at the same time. However, because of there of the whole face image, it can not guarantee the detection rate about the change of viewpoint and whole image. Thus, to compensate for the defects, Linear Discriminant Analysis(LDA) is used to enhance the separation of different classes. In this paper, we combine the PCA&LDA algorithm and design the optimized pRBFNNs for recognition module. The proposed pRBFNNs architecture consists of three functional modules such as the condition part, the conclusion part, and the inference part as fuzzy rules formed in 'If-then' format. In the condition part of fuzzy rules, input space is partitioned with Fuzzy C-Means clustering. In the conclusion part of rules, the connection weight of pRBFNNs is represented as two kinds of polynomials such as constant, and linear. The coefficients of connection weight identified with back-propagation using gradient descent method. The output of the pRBFNNs model is obtained by fuzzy inference method in the inference part of fuzzy rules. The essential design parameters (including learning rate, momentum coefficient and fuzzification coefficient) of the networks are optimized by means of Differential Evolution. The proposed pRBFNNs are applied to face image(ex Yale, AT&T) datasets and then demonstrated from the viewpoint of the output performance and recognition rate.

Proposal of Analysis Method for Biota Survey Data Using Co-occurrence Frequency

  • Yong-Ki Kim;Jeong-Boon Lee;Sung Je Lee;Jong-Hyun Kang
    • Proceedings of the National Institute of Ecology of the Republic of Korea
    • /
    • v.5 no.3
    • /
    • pp.76-85
    • /
    • 2024
  • The purpose of this study is to propose a new method of analysis focusing on interconnections between species rather than traditional biodiversity analysis, which represents ecosystems in terms of species and individual counts such as species diversity and species richness. This new approach aims to enhance our understanding of ecosystem networks. Utilizing data from the 4th National Natural Environment Survey (2014-2018), the following eight taxonomic groups were targeted for our study: herbaceous plants, woody plants, butterflies, Passeriformes birds, mammals, reptiles & amphibians, freshwater fishes, and benthonic macroinvertebrates. A co-occurrence frequency analysis was conducted using nationwide data collected over five years. As a result, in all eight taxonomic groups, the degree value represented by a linear regression trend line showed a slope of 0.8 and the weighted degree value showed an exponential nonlinear curve trend line with a coefficient of determination (R2) exceeding 0.95. The average value of the clustering coefficient was also around 0.8, reminiscent of well-known social phenomena. Creating a combination set from the species list grouped by temporal information such as survey date and spatial information such as coordinates or grids is an easy approach to discern species distributed regionally and locally. Particularly, grouping by species or taxonomic groups to produce data such as co-occurrence frequency between survey points could allow us to discover spatial similarities based on species present. This analysis could overcome limitations of species data. Since there are no restrictions on time or space, data collected over a short period in a small area and long-term national-scale data can be analyzed through appropriate grouping. The co-occurrence frequency analysis enables us to measure how many species are associated with a single species and the frequency of associations among each species, which will greatly help us understand ecosystems that seem too complex to comprehend. Such connectivity data and graphs generated by the co-occurrence frequency analysis of species are expected to provide a wealth of information and insights not only to researchers, but also to those who observe, manage, and live within ecosystems.

Quadrilateral-Triangular Mixed Grid System for Numerical Analysis of Incompressible Viscous Flow (비압축성 점성 유동의 수치적 해석을 위한 사각형-삼각형 혼합 격자계)

  • 심은보;박종천;류하상
    • Korean Journal of Computational Design and Engineering
    • /
    • v.1 no.1
    • /
    • pp.56-64
    • /
    • 1996
  • A quadrilateral-triangular mixed grid method for the solution of incompressible viscous flow is presented. The solution domain near the body surface is meshed using elliptic grid geneator to acculately simulate the viscous flow. On the other hand, we used unstructured triangular grid system generated by advancing front technique of a simple automatic grid generation algorithm in the rest of the computational domain. The present method thus is capable of not only handling complex geometries but providing accurate solutions near body surface. The numerical technique adopted here is PISO type finite element method which was developed by the present author. Investigations have been made of two-dimensional unsteady flow of Re=550 past a circular cylinder. In the case of use of the unstructured grid only, there exists a considerable amount of difference with the existing results in drag coefficient and vorticity at the cylinder surface; this may be because of the lack of the grid clustering to the surface that is a inevitable requirement to resolve the viscous flow. However, numerical results on the mixed grid show good agreements with the earlier computations and experimental data.

  • PDF

Water Demand Forecasting by Characteristics of City Using Principal Component and Cluster Analyses

  • Choi, Tae-Ho;Kwon, O-Eun;Koo, Ja-Yong
    • Environmental Engineering Research
    • /
    • v.15 no.3
    • /
    • pp.135-140
    • /
    • 2010
  • With the various urban characteristics of each city, the existing water demand prediction, which uses average liter per capita day, cannot be used to achieve an accurate prediction as it fails to consider several variables. Thus, this study considered social and industrial factors of 164 local cities, in addition to population and other directly influential factors, and used main substance and cluster analyses to develop a more efficient water demand prediction model that considers unique localities of each city. After clustering, a multiple regression model was developed that proved that the $R^2$ value of the inclusive multiple regression model was 0.59; whereas, those of Clusters A and B were 0.62 and 0.74, respectively. Thus, the multiple regression model was considered more reasonable and valid than the inclusive multiple regression model. In summary, the water demand prediction model using principal component and cluster analyses as the standards to classify localities has a better modification coefficient than that of the inclusive multiple regression model, which does not consider localities.

Human Development Convergence and the Impact of Funds Transfer to Regions: A Dynamic Panel Data Approach

  • GINANJAR, Rah Adi Fahmi;ZAHARA, Vadilla Mutia;SUCI, Stannia Cahaya;SUHENDRA, Indra
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.12
    • /
    • pp.593-604
    • /
    • 2020
  • This study analyzes human development convergence and the impact of funds transfer to the regions using σ and β-convergence analysis method. Observations were made in all Indonesia's provinces in the period 2010-2019. The coefficient of variation calculation shows a dispersion in the inequality of human development, which means that convergence occurred. This is also documented by the clustering analysis results developed in the study. The results are in line with the hypothesis of neoclassical theory, which shows the tendency for provinces with lower human development levels to grow relatively faster. The dynamic panel data approach with the GMM model shows that a model built with explanatory variables for transfer of funds to regions may lead to the process of convergence of human development - 2.21% per year or 31 years to cover the half-life of convergence. This is a consequence of the Special Allocation Fund and the Village Fund, which positively impact the convergence process, and the General Allocation Fund and the Revenue Sharing Fund with negative signs slowing the convergence process. This evidence opens opportunities to review the justification of the weighting component in determining the amount of funds transferred to the region to accelerate the convergence process of human development.

Design of Black Plastics Classifier Using Data Information (데이터 정보를 이용한 흑색 플라스틱 분류기 설계)

  • Park, Sang-Beom;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.4
    • /
    • pp.569-577
    • /
    • 2018
  • In this paper, with the aid of information which is included within data, preprocessing algorithm-based black plastic classifier is designed. The slope and area of spectrum obtained by using laser induced breakdown spectroscopy(LIBS) are analyzed for each material and its ensuing information is applied as the input data of the proposed classifier. The slope is represented by the rate of change of wavelength and intensity. Also, the area is calculated by the wavelength of the spectrum peak where the material property of chemical elements such as carbon and hydrogen appears. Using informations such as slope and area, input data of the proposed classifier is constructed. In the preprocessing part of the classifier, Principal Component Analysis(PCA) and fuzzy transform are used for dimensional reduction from high dimensional input variables to low dimensional input variables. Characteristic analysis of the materials as well as the processing speed of the classifier is improved. In the condition part, FCM clustering is applied and linear function is used as connection weight in the conclusion part. By means of Particle Swarm Optimization(PSO), parameters such as the number of clusters, fuzzification coefficient and the number of input variables are optimized. To demonstrate the superiority of classification performance, classification rate is compared by using WEKA 3.8 data mining software which contains various classifiers such as Naivebayes, SVM and Multilayer perceptron.