• Title/Summary/Keyword: k-means algorithms

Search Result 400, Processing Time 0.024 seconds

Development of a Knowledge Discovery System using Hierarchical Self-Organizing Map and Fuzzy Rule Generation

  • Koo, Taehoon;Rhee, Jongtae
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.431-434
    • /
    • 2001
  • Knowledge discovery in databases(KDD) is the process for extracting valid, novel, potentially useful and understandable knowledge form real data. There are many academic and industrial activities with new technologies and application areas. Particularly, data mining is the core step in the KDD process, consisting of many algorithms to perform clustering, pattern recognition and rule induction functions. The main goal of these algorithms is prediction and description. Prediction means the assessment of unknown variables. Description is concerned with providing understandable results in a compatible format to human users. We introduce an efficient data mining algorithm considering predictive and descriptive capability. Reasonable pattern is derived from real world data by a revised neural network model and a proposed fuzzy rule extraction technique is applied to obtain understandable knowledge. The proposed neural network model is a hierarchical self-organizing system. The rule base is compatible to decision makers perception because the generated fuzzy rule set reflects the human information process. Results from real world application are analyzed to evaluate the system\`s performance.

  • PDF

Self-adaptive Online Sequential Learning Radial Basis Function Classifier Using Multi-variable Normal Distribution Function

  • Dong, Keming;Kim, Hyoung-Joong;Suresh, Sundaram
    • 한국정보통신설비학회:학술대회논문집
    • /
    • 2009.08a
    • /
    • pp.382-386
    • /
    • 2009
  • Online or sequential learning is one of the most basic and powerful method to train neuron network, and it has been widely used in disease detection, weather prediction and other realistic classification problem. At present, there are many algorithms in this area, such as MRAN, GAP-RBFN, OS-ELM, SVM and SMC-RBF. Among them, SMC-RBF has the best performance; it has less number of hidden neurons, and best efficiency. However, all the existing algorithms use signal normal distribution as kernel function, which means the output of the kernel function is same at the different direction. In this paper, we use multi-variable normal distribution as kernel function, and derive EKF learning formulas for multi-variable normal distribution kernel function. From the result of the experience, we can deduct that the proposed method has better efficiency performance, and not sensitive to the data sequence.

  • PDF

A Variable Step Size LMS Algorithm Using Normalized Absolute Estimation Error

  • Kim, D. W.;S. H. Han;H. K. Hong;H. B. Kang;Park, J. S.
    • Journal of Electrical Engineering and information Science
    • /
    • v.1 no.2
    • /
    • pp.119-124
    • /
    • 1996
  • Variable step size LMS(VS-LMS) algorithms improve performance of LMS algorithm by means of varying the step size. This paper presents a new VS-LMS algorithm using normalized absolute estimation error. Normalizing the estimation error to the expected valus of the desired signal, we determined the step size using the relative size of estimation error, Because parameters and computational load are less, our algorithm is easy to implement in hardware. The performance of the proposed algorithm is analyzed theoretically and estimated through simulations. Based on the theoretical analysis and computer simulations, the proposed algorithm is shown to be effective compared to conventional VS-LMS algorithms.

  • PDF

A Detection Model using Labeling based on Inference and Unsupervised Learning Method (추론 및 비교사학습 기법 기반 레이블링을 적용한 탐지 모델)

  • Hong, Sung-Sam;Kim, Dong-Wook;Kim, Byungik;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.65-75
    • /
    • 2017
  • The Detection Model is the model to find the result of a certain purpose using artificial intelligent, data mining, intelligent algorithms In Cyber Security, it usually uses to detect intrusion, malwares, cyber incident, and attacks etc. There are an amount of unlabeled data that are collected in a real environment such as security data. Since the most of data are not defined the class labels, it is difficult to know type of data. Therefore, the label determination process is required to detect and analysis with accuracy. In this paper, we proposed a KDFL(K-means and D-S Fusion based Labeling) method using D-S inference and k-means(unsupervised) algorithms to decide label of data records by fusion, and a detection model architecture using a proposed labeling method. A proposed method has shown better performance on detection rate, accuracy, F1-measure index than other methods. In addition, since it has shown the improved results in error rate, we have verified good performance of our proposed method.

A New Item Recommendation Procedure Using Preference Boundary

  • Kim, Hyea-Kyeong;Jang, Moon-Kyoung;Kim, Jae-Kyeong;Cho, Yoon-Ho
    • Asia pacific journal of information systems
    • /
    • v.20 no.1
    • /
    • pp.81-99
    • /
    • 2010
  • Lately, in consumers' markets the number of new items is rapidly increasing at an overwhelming rate while consumers have limited access to information about those new products in making a sensible, well-informed purchase. Therefore, item providers and customers need a system which recommends right items to right customers. Also, whenever new items are released, for instance, the recommender system specializing in new items can help item providers locate and identify potential customers. Currently, new items are being added to an existing system without being specially noted to consumers, making it difficult for consumers to identify and evaluate new products introduced in the markets. Most of previous approaches for recommender systems have to rely on the usage history of customers. For new items, this content-based (CB) approach is simply not available for the system to recommend those new items to potential consumers. Although collaborative filtering (CF) approach is not directly applicable to solve the new item problem, it would be a good idea to use the basic principle of CF which identifies similar customers, i,e. neighbors, and recommend items to those customers who have liked the similar items in the past. This research aims to suggest a hybrid recommendation procedure based on the preference boundary of target customer. We suggest the hybrid recommendation procedure using the preference boundary in the feature space for recommending new items only. The basic principle is that if a new item belongs within the preference boundary of a target customer, then it is evaluated to be preferred by the customer. Customers' preferences and characteristics of items including new items are represented in a feature space, and the scope or boundary of the target customer's preference is extended to those of neighbors'. The new item recommendation procedure consists of three steps. The first step is analyzing the profile of items, which are represented as k-dimensional feature values. The second step is to determine the representative point of the target customer's preference boundary, the centroid, based on a personal information set. To determine the centroid of preference boundary of a target customer, three algorithms are developed in this research: one is using the centroid of a target customer only (TC), the other is using centroid of a (dummy) big target customer that is composed of a target customer and his/her neighbors (BC), and another is using centroids of a target customer and his/her neighbors (NC). The third step is to determine the range of the preference boundary, the radius. The suggested algorithm Is using the average distance (AD) between the centroid and all purchased items. We test whether the CF-based approach to determine the centroid of the preference boundary improves the recommendation quality or not. For this purpose, we develop two hybrid algorithms, BC and NC, which use neighbors when deciding centroid of the preference boundary. To test the validity of hybrid algorithms, BC and NC, we developed CB-algorithm, TC, which uses target customers only. We measured effectiveness scores of suggested algorithms and compared them through a series of experiments with a set of real mobile image transaction data. We spilt the period between 1st June 2004 and 31st July and the period between 1st August and 31st August 2004 as a training set and a test set, respectively. The training set Is used to make the preference boundary, and the test set is used to evaluate the performance of the suggested hybrid recommendation procedure. The main aim of this research Is to compare the hybrid recommendation algorithm with the CB algorithm. To evaluate the performance of each algorithm, we compare the purchased new item list in test period with the recommended item list which is recommended by suggested algorithms. So we employ the evaluation metric to hit the ratio for evaluating our algorithms. The hit ratio is defined as the ratio of the hit set size to the recommended set size. The hit set size means the number of success of recommendations in our experiment, and the test set size means the number of purchased items during the test period. Experimental test result shows the hit ratio of BC and NC is bigger than that of TC. This means using neighbors Is more effective to recommend new items. That is hybrid algorithm using CF is more effective when recommending to consumers new items than the algorithm using only CB. The reason of the smaller hit ratio of BC than that of NC is that BC is defined as a dummy or virtual customer who purchased all items of target customers' and neighbors'. That is centroid of BC often shifts from that of TC, so it tends to reflect skewed characters of target customer. So the recommendation algorithm using NC shows the best hit ratio, because NC has sufficient information about target customers and their neighbors without damaging the information about the target customers.

Analysis of Academic Achievement Data Using AI Cluster Algorithms (AI 군집 알고리즘을 활용한 학업 성취도 데이터 분석)

  • Koo, Dukhoi;Jung, Soyeong
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.6
    • /
    • pp.1005-1013
    • /
    • 2021
  • With the prolonged COVID-19, the existing academic gap is widening. The purpose of this study is to provide homeroom teachers with a visual confirmation of the academic achievement gap in grades and classrooms through academic achievement analysis, and to use this to help them design lessons and explore ways to improve the academic achievement gap. The data of students' Korean and math diagnostic evaluation scores at the beginning of the school year were visualized as clusters using the K-means algorithm, and as a result, it was confirmed that a meaningful clusters were formed. In addition, through the results of the teacher interview, it was confirmed that this system was meaningful in improving the academic achievement gap, such as checking the learning level and academic achievement of students, and designing classes such as individual supplementary instruction and level-specific learning. This means that this academic achievement data analysis system helps to improve the academic gap. This study provides practical help to homeroom teachers in exploring ways to improve the academic gap in grades and classes, and is expected to ultimately contribute to improving the academic gap.

A Comparative Study on the phoneme recognition rate with regard to HMM training algorithms (HMM 훈련 알고리즘에 따른 음소인식률 비교 연구)

  • 구명완
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1998.08a
    • /
    • pp.298-301
    • /
    • 1998
  • HMM 훈련 방법에 따른 음소인식률의 변화에 대하여 기술한다. 음성모델은 이산 확률 밀도 혹은 연속 확률 밀도를 갖는 HMM을 사용하였으며, 훈련 알고리즘으로서는 forward-backward 와 segmental K-means 알고리즘을 사용하였다. 연속 확률 밀도는 N개의 mixture로 구성되어 있는데 1개의 mixture로 확장할 경우에서는 이진 트리 방식과 one-by-one 방식을 사용하였다. 여러 가지의 조합을 이용하여 음소인식 실험을 수행한 결과 연속 확률 분포를 사용하고 one-by-one 방식을 사용한 forward-backward 알고리즘이 가장 우수한 결과를 나타내었다.

  • PDF

Search of Transcriptional Motif Combination using Evolutionary Algorithms (진화 알고리즘을 통한 전사 조절 모티프 조합 탐색)

  • 이제근;정제균;오석준;장병탁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.328-330
    • /
    • 2004
  • 유전자 발현은 다양한 전사 인자들의 상호 작용에 의해서 조절되어진다 이러한 전사 인자들에 존재하는 모티프는 직접적으로 조절 작용을 위한 기능을 수행한다. 또한 대부분의 경우에서 여러 모티프가 함께 유전자 발현 기작을 위하여 조절 작용을 한다. 따라서 이러한 모티프들이 어떤 조합으로 함께 전사 과정에 관여하는지 여부를 밝히는 작업은 중요한 일이다. 본 논문에서 진화 연산을 응용하여, 다양한 조건 하에 전사 과정에 중요하게 작용하는 모티프들의 조합을 알아보았고, 그 결과를 기본적인 k-Means 알고리즘 등과 비교하여 제안한 방법이 유전자들의 상관관계에 있어서 보다 우수한 결과를 보임을 알 수 있었다.

  • PDF

Using the GA in the Public-Transportation Route-selection Process

  • JUN, Chul-min
    • Korean Journal of Geomatics
    • /
    • v.3 no.2
    • /
    • pp.123-128
    • /
    • 2004
  • As the applied fields of GIS are expanded to the transportation, developing internet-based applications for transportation information is getting attention increasingly. Most applications developed so far are primarily focused on guidance systems for owner-driven cars. Although some recent ones are devoted to public transportation systems, they show limitations in dealing with the following aspects: (i) people may change transportation means not only within the same type but also among different modes such as between buses and subways, and (ii) the system should take into account the time taken in transfer from one mode to the other. This study suggest the framework for developing a public transportation guidance system that generates optimized paths in the transportation network of mixed means including buses, subways and other modes. For this study, the Genetic Algorithms are used to find the best routes that take into account transfer time and other service-time constraints.

  • PDF

A Study on the Color Image Segmentation Algorithm Based on the Scale-Space Filter and the Fuzzy c-Means Techniques (스케일 공간 필터와 FCM을 이용한 컬러 영상영역화에 관한 연구)

  • 임영원;이상욱
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.12
    • /
    • pp.1548-1558
    • /
    • 1988
  • In this paper, a segmentation algorithm for color images based on the scale-space filter and the Fuzzy c-means (FCM) techniques is proposed. The methodology uses a coarse-fine concept to reduce the computational burden required for the FCM. The coarse segmentation attempts to segment coarsely using a thresholding technique, while a fine segmentation assigns the unclassified pixels by a coarse segmentation to the closest class using the FCM. Attempts also have been made to compare the performance of the proposed algorithm with other algorithms such as Ohlander's, Rosenfeld's, and Bezdek's. Intensive computer simulations has been done and the results are discussed in the paper. The simulation results indicate that the proposed algorithm produces the most accurate segmentation on the O-K-S color coordinate while requiring a reasonable amount of computational effort.

  • PDF