• Title/Summary/Keyword: cluster set

Search Result 616, Processing Time 0.024 seconds

Predictive Analysis of Financial Fraud Detection using Azure and Spark ML

  • Priyanka Purushu;Niklas Melcher;Bhagyashree Bhagwat;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.28 no.4
    • /
    • pp.308-319
    • /
    • 2018
  • This paper aims at providing valuable insights on Financial Fraud Detection on a mobile money transactional activity. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure and Spark ML, which are traditional systems and Big Data respectively. Experimenting with sample dataset in Azure, we found that the Decision Forest model is the most accurate to proceed in terms of the recall value. For the massive data set using Spark ML, it is found that the Random Forest classifier algorithm of the classification model proves to be the best algorithm. It is presented that the Spark cluster gets much faster to build and evaluate models as adding more servers to the cluster with the same accuracy, which proves that the large scale data set can be predictable using Big Data platform. Finally, we reached a recall score with 0.73, which implies a satisfying prediction quality in predicting fraudulent transactions.

Selection and Classification of Bacterial Strains Using Standardization and Cluster Analysis

  • Lee, Sang Moo;Kim, Kyoung Hoon;Kim, Eun Joong
    • Journal of Animal Science and Technology
    • /
    • v.54 no.6
    • /
    • pp.463-469
    • /
    • 2012
  • This study utilized a standardization and cluster analysis technique for the selection and classification of beneficial bacteria. A set of synthetic data consisting of 100 individual variables with three characteristics was created for analysis. The three characteristics assigned to each independent variable were designated to have different numeric scales, averages, and standard deviations. The variables were bacterial isolates at random, and the three characteristics were fermentation products, including cell yield, antioxidant activity of culture, and enzyme production. A standardization method utilizing a standard normal distribution equation to record fermentation yields of each isolate was employed to weight their different numeric scales and deviations. Following transformation, the data set was analyzed by cluster analysis. The Manhattan method for dissimilarity matrix construction along with complete linkage technique, an agglomerative method for hierarchical cluster analysis, was employed using statistical computing program R. A total of 100 isolates were classified into groups A, B, and C. In a comparison of the characteristics of each group, all characteristics in groups A and C were higher than those of group B. Isolates displaying higher cell yield were classified as group A, whereas those isolates showing high antioxidant activity and enzyme production were assigned to group C. The results of the cluster analysis can be useful for the classification of numerous isolates and the preparation of an isolation pool using numerical or statistical tools. The present study suggests that a simple technique can be applied to screen and select beneficial microbes using the freely downloadable statistical computing program R.

Galactic gas depletion process in cosmological hydrodynamic cluster zoom-in simulation

  • Jung, Seoyoung;Choi, Hoseung;Yi, Sukyoung K.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.42 no.2
    • /
    • pp.76.1-76.1
    • /
    • 2017
  • In cluster environments, most of the galaxies are found to be red and dead, but the origin of these passive galaxies is not yet clearly understood. Using a set of cosmological hydrodynamic zoom-in simulations, we study gas depletion process in and outside clusters. Our results are consistent with previous studies showing rapid stripping of a galactic cold gas reservoir during the first infall to the cluster center. Moreover, we found a fraction of galaxies that were already in the gas deficient state before reaching the cluster (i.e., pre-processed galaxies) is non-negligible. These findings lead to the idea that a complete understanding of passive galaxy population in clusters can not be achieved without a detailed understanding of gas stripping process in group size halos prior to the cluster infall.

  • PDF

The Impact of Network Coding Cluster Size on Approximate Decoding Performance

  • Kwon, Minhae;Park, Hyunggon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.3
    • /
    • pp.1144-1158
    • /
    • 2016
  • In this paper, delay-constrained data transmission is considered over error-prone networks. Network coding is deployed for efficient information exchange, and an approximate decoding approach is deployed to overcome potential all-or-nothing problems. Our focus is on determining the cluster size and its impact on approximate decoding performance. Decoding performance is quantified, and we show that performance is determined only by the number of packets. Moreover, the fundamental tradeoff between approximate decoding performance and data transfer rate improvement is analyzed; as the cluster size increases, the data transfer rate improves and decoding performance is degraded. This tradeoff can lead to an optimal cluster size of network coding-based networks that achieves the target decoding performance of applications. A set of experiment results confirms the analysis.

Improved Classification Algorithm using Extended Fuzzy Clustering and Maximum Likelihood Method

  • Jeon Young-Joon;Kim Jin-Il
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.447-450
    • /
    • 2004
  • This paper proposes remotely sensed image classification method by fuzzy c-means clustering algorithm using average intra-cluster distance. The average intra-cluster distance acquires an average of the vector set belong to each cluster and proportionates to its size and density. We perform classification according to pixel's membership grade by cluster center of fuzzy c-means clustering using the mean-values of training data about each class. Fuzzy c-means algorithm considered membership degree for inter-cluster of each class. And then, we validate degree of overlap between clusters. A pixel which has a high degree of overlap applies to the maximum likelihood classification method. Finally, we decide category by comparing with fuzzy membership degree and likelihood rate. The proposed method is applied to IKONOS remote sensing satellite image for the verifying test.

  • PDF

The Low Power Algorithm using a Feasible Clustert Generation Method considered Glitch (글리치를 고려한 매핑가능 클러스터 생성 방법을 이용한 저전력 알고리즘)

  • Kim, Jaejin
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.12 no.2
    • /
    • pp.7-14
    • /
    • 2016
  • In this paper presents a low power algorithm using a feasible cluster generation method considered glitch. The proposed algorithm is a method for reducing power consumption of a given circuit. The algorithm consists of a feasible cluster generation process and glitches removal process. So that glitches are not generated for the node to which the switching operation occurs most frequently in order to reduce the power consumption is a method for generating a feasible cluster. A feasible cluster generation process consisted of a node value set, dividing the node, the node aligned with the feasible cluster generation. A feasible cluster generation procedure is produced from the highest number of nodes in the output. When exceeding the number of OR-terms of the inputs of the selected node CLB prevents the signal path is varied by the evenly divided. If there are nodes with the same number of outputs selected by the first highest number of nodes in the input produces a feasible cluster. Glitch removal process removes glitches through the path balancing in the same manner as [5]. Experimental results were compared with the proposed algorithm [5]. Number of blocks has been increased by 5%, the power consumption was reduced by 3%.

A Method in Evaluating Mechanical Design Plans With Fuzzy Theory

  • Faliang, Gao
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 1993.06a
    • /
    • pp.1163-1166
    • /
    • 1993
  • This paper studies the evaluation of mechanical design plans through fuzzy cluster. Plans are classified into two sets, 'good' and 'bad'. The membership of a plan to the 'good' set is numerically equal to the distance to the 'bad' set. The central parameter of the 'good' set is defined as '1', and that of the 'bad' set '0'. This will greatly simplify calculations. The result of the calculating example proves the method available.

  • PDF

ITERATING A SYSTEM OF SET-VALUED VARIATIONAL INCLUSION PROBLEMS IN SEMI-INNER PRODUCT SPACES

  • Shafi, Sumeera
    • The Pure and Applied Mathematics
    • /
    • v.29 no.4
    • /
    • pp.255-275
    • /
    • 2022
  • In this paper, we introduce a new system of set-valued variational inclusion problems in semi-inner product spaces. We use resolvent operator technique to propose an iterative algorithm for computing the approximate solution of the system of set-valued variational inclusion problems. The results presented in this paper generalize, improve and unify many previously known results in the literature.

Improved Multidimensional Scaling Techniques Considering Cluster Analysis: Cluster-oriented Scaling (클러스터링을 고려한 다차원척도법의 개선: 군집 지향 척도법)

  • Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.45-70
    • /
    • 2012
  • There have been many methods and algorithms proposed for multidimensional scaling to mapping the relationships between data objects into low dimensional space. But traditional techniques, such as PROXSCAL or ALSCAL, were found not effective for visualizing the proximities between objects and the structure of clusters of large data sets have more than 50 objects. The CLUSCAL(CLUster-oriented SCALing) technique introduced in this paper differs from them especially in that it uses cluster structure of input data set. The CLUSCAL procedure was tested and evaluated on two data sets, one is 50 authors co-citation data and the other is 85 words co-occurrence data. The results can be regarded as promising the usefulness of CLUSCAL method especially in identifying clusters on MDS maps.

Document Clustering Method using Coherence of Cluster and Non-negative Matrix Factorization (비음수 행렬 분해와 군집의 응집도를 이용한 문서군집)

  • Kim, Chul-Won;Park, Sun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.13 no.12
    • /
    • pp.2603-2608
    • /
    • 2009
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the clustering method based NMF(non-negative matrix factorization) and refinement of documents in cluster by using coherence of cluster. The proposed method can improve the quality of document clustering because the re-assigned documents in cluster by using coherence of cluster based similarity between documents, the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.