• Title/Summary/Keyword: Clustering Problem

Search Result 709, Processing Time 0.027 seconds

Deconstructing Opinion Survey: A Case Study

  • Alanazi, Entesar
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.4
    • /
    • pp.52-58
    • /
    • 2021
  • Questionnaires and surveys are increasingly being used to collect information from participants of empirical software engineering studies. Usually, such data is analyzed using statistical methods to show an overall picture of participants' agreement or disagreement. In general, the whole survey population is considered as one group with some methods to extract varieties. Sometimes, there are different opinions in the same group, but they are not well discovered. In some cases of the analysis, the population may be divided into subgroups according to some data. The opinions of different segments of the population may be the same. Even though the existing approach can capture the general trends, there is a risk that the opinions of different sub-groups are lost. The problem becomes more complex in longitudinal studies where minority opinions might fade over time. Longitudinal survey data may include several interesting patterns that can be extracted using a clustering process. It can discover new information and give attention to different opinions. We suggest using a data mining approach to finding the diversity among the different groups in longitudinal studies. Our study shows that diversity can be revealed and tracked over time using the clustering approach, and the minorities have an opportunity to be heard.

Improving Web Service Recommendation using Clustering with K-NN and SVD Algorithms

  • Weerasinghe, Amith M.;Rupasingha, Rupasingha A.H.M.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.5
    • /
    • pp.1708-1727
    • /
    • 2021
  • In the advent of the twenty-first century, human beings began to closely interact with technology. Today, technology is developing, and as a result, the world wide web (www) has a very important place on the Internet and the significant task is fulfilled by Web services. A lot of Web services are available on the Internet and, therefore, it is difficult to find matching Web services among the available Web services. The recommendation systems can help in fixing this problem. In this paper, our observation was based on the recommended method such as the collaborative filtering (CF) technique which faces some failure from the data sparsity and the cold-start problems. To overcome these problems, we first applied an ontology-based clustering and then the k-nearest neighbor (KNN) algorithm for each separate cluster group that effectively increased the data density using the past user interests. Then, user ratings were predicted based on the model-based approach, such as singular value decomposition (SVD) and the predictions used for the recommendation. The evaluation results showed that our proposed approach has a less prediction error rate with high accuracy after analyzing the existing recommendation methods.

Multi-view Clustering by Spectral Structure Fusion and Novel Low-rank Approximation

  • Long, Yin;Liu, Xiaobo;Murphy, Simon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.813-829
    • /
    • 2022
  • In multi-view subspace clustering, how to integrate the complementary information between perspectives to construct a unified representation is a critical problem. In the existing works, the unified representation is usually constructed in the original data space. However, when the data representation in each view is very diverse, the unified representation derived directly in the original data domain may lead to a huge information loss. To address this issue, different to the existing works, inspired by the latest revelation that the data across all perspectives have a very similar or close spectral block structure, we try to construct the unified representation in the spectral embedding domain. In this way, the complementary information across all perspectives can be fused into a unified representation with little information loss, since the spectral block structure from all views shares high consistency. In addition, to capture the global structure of data on each view with high accuracy and robustness both, we propose a novel low-rank approximation via the tight lower bound on the rank function. Finally, experimental results prove that, the proposed method has the effectiveness and robustness at the same time, compared with the state-of-art approaches.

Community Detection using Closeness Similarity based on Common Neighbor Node Clustering Entropy

  • Jiang, Wanchang;Zhang, Xiaoxi;Zhu, Weihua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.8
    • /
    • pp.2587-2605
    • /
    • 2022
  • In order to efficiently detect community structure in complex networks, community detection algorithms can be designed from the perspective of node similarity. However, the appropriate parameters should be chosen to achieve community division, furthermore, these existing algorithms based on the similarity of common neighbors have low discrimination between node pairs. To solve the above problems, a noval community detection algorithm using closeness similarity based on common neighbor node clustering entropy is proposed, shorted as CSCDA. Firstly, to improve detection accuracy, common neighbors and clustering coefficient are combined in the form of entropy, then a new closeness similarity measure is proposed. Through the designed similarity measure, the closeness similar node set of each node can be further accurately identified. Secondly, to reduce the randomness of the community detection result, based on the closeness similar node set, the node leadership is used to determine the most closeness similar first-order neighbor node for merging to create the initial communities. Thirdly, for the difficult problem of parameter selection in existing algorithms, the merging of two levels is used to iteratively detect the final communities with the idea of modularity optimization. Finally, experiments show that the normalized mutual information values are increased by an average of 8.06% and 5.94% on two scales of synthetic networks and real-world networks with real communities, and modularity is increased by an average of 0.80% on the real-world networks without real communities.

Extended Kepler Grid-based System for Diabetes Study Workspace

  • Hazemi, Fawaz Al;Youn, Chan-Hyun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.230-233
    • /
    • 2011
  • Chronic disease is linked to patient's' lifestyle. Therefore, doctor has to monitor his/her patient over time. This may involve reviewing many reports, finding any changes, and modifying several treatments. One solution to optimize the burden is using a visualizing tool over time such as a timeline-based visualization tool where all reports and medicine are integrated in a problem centric and time-based style to enable the doctor to predict and adjust the treatment plan. This solution was proposed by Bui et. al. [2] to observe the medical history of a patient. However, there was limitation of studying the diabetes patient's history to find out what was the cause of the current development in patient's condition; moreover what would be the prediction of current implication in one of the diabetes' related factors (such as fat, cholesterol, or potassium). In this paper, we propose a Grid-based Interactive Diabetes System (GIDS) to support bioinformatics analysis application for diabetes diseases. GIDS used an agglomerative clustering algorithm as clustering correlation algorithm as primary algorithm to focus medical researcher in the findings to predict the implication of the undertaken diabetes patient. The algorithm was Chronological Clustering proposed by P. Legendre [11] [12].

Deconstructing Agile Survey to Identify Agile Skeptics

  • Entesar Alanazi;Mohammad Mahdi Hassan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.3
    • /
    • pp.201-210
    • /
    • 2024
  • In empirical software engineering research, there is an increased use of questionnaires and surveys to collect information from practitioners. Typically, such data is then analyzed based on overall, descriptive statistics. Overall, they consider the whole survey population as a single group with some sampling techniques to extract varieties. In some cases, the population is also partitioned into sub-groups based on some background information. However, this does not reveal opinion diversity properly as similar opinions can exist in different segments of the population, whereas people within the same group might have different opinions. Even though existing approach can capture the general trends there is a risk that the opinions of different sub-groups are lost. The problem becomes more complex in case of longitudinal studies where minority opinions might fade or resolute over time. Survey based longitudinal data may have some potential patterns which can be extracted through a clustering process. It may reveal new information and attract attention to alternative perspectives. We suggest using a data mining approach to finding the diversity among the different groups in longitudinal studies (agile skeptics). In our study, we show that diversity can be revealed and tracked over time with the use of clustering approach, and the minorities have an opportunity to be heard.

A Geometrical Center based Two-way Search Heuristic Algorithm for Vehicle Routing Problem with Pickups and Deliveries

  • Shin, Kwang-Cheol
    • Journal of Information Processing Systems
    • /
    • v.5 no.4
    • /
    • pp.237-242
    • /
    • 2009
  • The classical vehicle routing problem (VRP) can be extended by including customers who want to send goods to the depot. This type of VRP is called the vehicle routing problem with pickups and deliveries (VRPPD). This study proposes a novel way to solve VRPPD by introducing a two-phase heuristic routing algorithm which consists of a clustering phase and uses the geometrical center of a cluster and route establishment phase by applying a two-way search of each route after applying the TSP algorithm on each route. Experimental results show that the suggested algorithm can generate better initial solutions for more computer-intensive meta-heuristics than other existing methods such as the giant-tour-based partitioning method or the insertion-based method.

The study on representation, Digital coding and Clustering of odor information (후각정보 표현, 부호화 및 클러스터링에 관한 연구)

  • Kim, Jeong-Do;Jung, Suk-Woo;Kim, Dong-Jin
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.598-601
    • /
    • 2004
  • In this paper, we suggest method that change odors to digital data. For this, we selected emotional adjective of odors as olfactory receptor This emotional adjective(expressional receptor) is about 40. Each odors are expressed by adjective equivalent to oneself. Expressed odors as emotional receptor is encoded as proposed method for transmission, and after transmission, It should be decoded for expression again. The applied decoding method is fuzzy c-means clustering algorithm(FCMA). But, because odor data is expressed to 40 dimensions, FCMA uses a lot of computing times and memories. To solve this problem, after we reduce dimension through principal component analysis(PCA), we use FCMA algorithm.

  • PDF

A Study on the Feature Region Segmentation for the Analysis of Eye-fundus Images (안저영상 해석을 위한 특징영역의 분할에 관한 연구)

  • 강전권;한영환
    • Journal of Biomedical Engineering Research
    • /
    • v.16 no.2
    • /
    • pp.121-128
    • /
    • 1995
  • Information about retinal blood vessels can be used in grading disease severity or as part of the process of automated diagnosis of diseases with ocular menifestations. In this paper, we address the problem of detecting retinal blood vessels and optic disk (papilla) in eye-fundus images. We introduce an algorithm for feature extraction based on Fuzzy Clustering algorithm (fuzzy c-means). A method of finding the optic disk (papilla) is proposed in the eye-fundus images. Additionally, the inrormations such as position and area of the optic disk are extracted. The results are compared to those obtained from other methods. The automatic detection of retinal blood vessels and optic disk in the eye-rundus images could help physicians in diagnosing ocular diseases.

  • PDF

Simple Compromise Strategies in Multivariate Stratification

  • Park, Inho
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.2
    • /
    • pp.97-105
    • /
    • 2013
  • Stratification (among other applications) is a popular technique used in survey practice to improve the accuracy of estimators. Its full potential benefit can be gained by the effective use of auxiliary variables in stratification related to survey variables. This paper focuses on the problem of stratum formation when multiple stratification variables are available. We first review a variance reduction strategy in the case of univariate stratification. We then discuss its use for multivariate situations in convenient and efficient ways using three methods: compromised measures of size, principal components analysis and a K-means clustering algorithm. We also consider three types of compromising factors to data when using these three methods. Finally, we compare their efficiency using data from MU281 Swedish municipality population.