• Title/Summary/Keyword: similarity based clustering

Search Result 322, Processing Time 0.026 seconds

Modeling of Self-Constructed Clustering and Performance Evaluation (자기-구성 클러스터링의 모델링 및 성능평가)

  • Ryu Jeong woong;Kim Sung Suk;Song Chang kyu;Kim Sung Soo
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.6C
    • /
    • pp.490-496
    • /
    • 2005
  • In this paper, we propose a self-constructed clustering algorithm based on inference information of the fuzzy model. This method makes it possible to automatically detect and optimize the number of cluster and parameters by using input-output data. The propose method improves the performance of clustering by extended supervised learning technique. This technique uses the output information as well as input characteristics. For effect the similarity measure in clustering, we use the TSK fuzzy model to sent the information of output. In the conceptually, we design a learning method that use to feedback the information of output to the clustering since proposed algorithm perform to separate each classes in input data space. We show effectiveness of proposed method using simulation than previous ones

Similarity measurement based on Min-Hash for Preserving Privacy

  • Cha, Hyun-Jong;Yang, Ho-Kyung;Song, You-Jin
    • International Journal of Advanced Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.240-245
    • /
    • 2022
  • Because of the importance of the information, encryption algorithms are heavily used. Raw data is encrypted and secure, but problems arise when the key for decryption is exposed. In particular, large-scale Internet sites such as Facebook and Amazon suffer serious damage when user data is exposed. Recently, research into a new fourth-generation encryption technology that can protect user-related data without the use of a key required for encryption is attracting attention. Also, data clustering technology using encryption is attracting attention. In this paper, we try to reduce key exposure by using homomorphic encryption. In addition, we want to maintain privacy through similarity measurement. Additionally, holistic similarity measurements are time-consuming and expensive as the data size and scope increases. Therefore, Min-Hash has been studied to efficiently estimate the similarity between two signatures Methods of measuring similarity that have been studied in the past are time-consuming and expensive as the size and area of data increases. However, Min-Hash allowed us to efficiently infer the similarity between the two sets. Min-Hash is widely used for anti-plagiarism, graph and image analysis, and genetic analysis. Therefore, this paper reports privacy using homomorphic encryption and presents a model for efficient similarity measurement using Min-Hash.

Research on the Hybrid Paragraph Detection System Using Syntactic-Semantic Analysis (구문의미 분석을 활용한 복합 문단구분 시스템에 대한 연구)

  • Kang, Won Seog
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.106-116
    • /
    • 2021
  • To increase the quality of the system in the subjective-type question grading and document classification, we need the paragraph detection. But it is not easy because it is accompanied by semantic analysis. Many researches on the paragraph detection solve the detection problem using the word based clustering method. However, the word based method can not use the order and dependency relation between words. This paper suggests the paragraph detection system using syntactic-semantic relation between words with the Korean syntactic-semantic analysis. This system is the hybrid system of word based, concept based, and syntactic-semantic tree based detection. The experiment result of the system shows it has the better result than the word based system. This system will be utilized in Korean subjective question grading and document classification.

Sentence model based subword embeddings for a dialog system

  • Chung, Euisok;Kim, Hyun Woo;Song, Hwa Jeon
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.599-612
    • /
    • 2022
  • This study focuses on improving a word embedding model to enhance the performance of downstream tasks, such as those of dialog systems. To improve traditional word embedding models, such as skip-gram, it is critical to refine the word features and expand the context model. In this paper, we approach the word model from the perspective of subword embedding and attempt to extend the context model by integrating various sentence models. Our proposed sentence model is a subword-based skip-thought model that integrates self-attention and relative position encoding techniques. We also propose a clustering-based dialog model for downstream task verification and evaluate its relationship with the sentence-model-based subword embedding technique. The proposed subword embedding method produces better results than previous methods in evaluating word and sentence similarity. In addition, the downstream task verification, a clustering-based dialog system, demonstrates an improvement of up to 4.86% over the results of FastText in previous research.

Magnifying Block Diagonal Structure for Spectral Clustering (스펙트럼 군집화에서 블록 대각 형태의 유사도 행렬 구성)

  • Heo, Gyeong-Yong;Kim, Kwang-Baek;Woo, Young-Woon
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.9
    • /
    • pp.1302-1309
    • /
    • 2008
  • Traditional clustering methods, like k-means or fuzzy clustering, are prototype-based methods which are applicable only to convex clusters. On the other hand, spectral clustering tries to find clusters only using local similarity information. Its ability to handle concave clusters has gained the popularity recent years together with support vector machine (SVM) which is a kernel-based classification method. However, as is in SVM, the kernel width plays an important role and has a great impact on the result. Several methods are proposed to decide it automatically, it is still determined based on heuristics. In this paper, we proposed an adaptive method deciding the kernel width based on distance histogram. The proposed method is motivated by the fact that the affinity matrix should be formed into a block diagonal matrix to generate the best result. We use the tradition Euclidean distance together with the random walk distance, which make it possible to form a more apparent block diagonal affinity matrix. Experimental results show that the proposed method generates more clear block structured affinity matrix than the existing one does.

  • PDF

Evolutionary Computation-based Hybird Clustring Technique for Manufacuring Time Series Data (제조 시계열 데이터를 위한 진화 연산 기반의 하이브리드 클러스터링 기법)

  • Oh, Sanghoun;Ahn, Chang Wook
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.23-30
    • /
    • 2021
  • Although the manufacturing time series data clustering technique is an important grouping solution in the field of detecting and improving manufacturing large data-based equipment and process defects, it has a disadvantage of low accuracy when applying the existing static data target clustering technique to time series data. In this paper, an evolutionary computation-based time series cluster analysis approach is presented to improve the coherence of existing clustering techniques. To this end, first, the image shape resulting from the manufacturing process is converted into one-dimensional time series data using linear scanning, and the optimal sub-clusters for hierarchical cluster analysis and split cluster analysis are derived based on the Pearson distance metric as the target of the transformation data. Finally, by using a genetic algorithm, an optimal cluster combination with minimal similarity is derived for the two cluster analysis results. And the performance superiority of the proposed clustering is verified by comparing the performance with the existing clustering technique for the actual manufacturing process image.

K-Means Clustering with Content Based Doctor Recommendation for Cancer

  • kumar, Rethina;Ganapathy, Gopinath;Kang, Jeong-Jin
    • International Journal of Advanced Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.167-176
    • /
    • 2020
  • Recommendation Systems is the top requirements for many people and researchers for the need required by them with the proper suggestion with their personal indeed, sorting and suggesting doctor to the patient. Most of the rating prediction in recommendation systems are based on patient's feedback with their information regarding their treatment. Patient's preferences will be based on the historical behaviour of similar patients. The similarity between the patients is generally measured by the patient's feedback with the information about the doctor with the treatment methods with their success rate. This paper presents a new method of predicting Top Ranked Doctor's in recommendation systems. The proposed Recommendation system starts by identifying the similar doctor based on the patients' health requirements and cluster them using K-Means Efficient Clustering. Our proposed K-Means Clustering with Content Based Doctor Recommendation for Cancer (KMC-CBD) helps users to find an optimal solution. The core component of KMC-CBD Recommended system suggests patients with top recommended doctors similar to the other patients who already treated with that doctor and supports the choice of the doctor and the hospital for the patient requirements and their health condition. The recommendation System first computes K-Means Clustering is an unsupervised learning among Doctors according to their profile and list the Doctors according to their Medical profile. Then the Content based doctor recommendation System generates a Top rated list of doctors for the given patient profile by exploiting health data shared by the crowd internet community. Patients can find the most similar patients, so that they can analyze how they are treated for the similar diseases, and they can send and receive suggestions to solve their health issues. In order to the improve Recommendation system efficiency, the patient can express their health information by a natural-language sentence. The Recommendation system analyze and identifies the most relevant medical area for that specific case and uses this information for the recommendation task. Provided by users as well as the recommended system to suggest the right doctors for a specific health problem. Our proposed system is implemented in Python with necessary functions and dataset.

Two-phase Content-based Image Retrieval Using the Clustering of Feature Vector (특징벡터의 끌러스터링 기법을 통한 2단계 내용기반 이미지검색 시스템)

  • 조정원;최병욱
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.40 no.3
    • /
    • pp.171-180
    • /
    • 2003
  • A content-based image retrieval(CBIR) system builds the image database using low-level features such as color, shape and texture and provides similar images that user wants to retrieve when the retrieval request occurs. What the user is interest in is a response time in consideration of the building time to build the index database and the response time to obtain the retrieval results from the query image. In a content-based image retrieval system, the similarity computing time comparing a query with images in database takes the most time in whole response time. In this paper, we propose the two-phase search method with the clustering technique of feature vector in order to minimize the similarity computing time. Experimental results show that this two-phase search method is 2-times faster than the conventional full-search method using original features of ail images in image database, while maintaining the same retrieval relevance as the conventional full-search method. And the proposed method is more effective as the number of images increases.

Traffic Speed Prediction Based on Graph Neural Networks for Intelligent Transportation System (지능형 교통 시스템을 위한 Graph Neural Networks 기반 교통 속도 예측)

  • Kim, Sunghoon;Park, Jonghyuk;Choi, Yerim
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.1
    • /
    • pp.70-85
    • /
    • 2021
  • Deep learning methodology, which has been actively studied in recent years, has improved the performance of artificial intelligence. Accordingly, systems utilizing deep learning have been proposed in various industries. In traffic systems, spatio-temporal graph modeling using GNN was found to be effective in predicting traffic speed. Still, it has a disadvantage that the model is trained inefficiently due to the memory bottleneck. Therefore, in this study, the road network is clustered through the graph clustering algorithm to reduce memory bottlenecks and simultaneously achieve superior performance. In order to verify the proposed method, the similarity of road speed distribution was measured using Jensen-Shannon divergence based on the analysis result of Incheon UTIC data. Then, the road network was clustered by spectrum clustering based on the measured similarity. As a result of the experiments, it was found that when the road network was divided into seven networks, the memory bottleneck was alleviated while recording the best performance compared to the baselines with MAE of 5.52km/h.

Optimization of Fuzzy Set-based Fuzzy Inference Systems Based on Evolutionary Data Granulation (진화론적 데이터 입자에 기반한 퍼지 집합 기반 퍼지 추론 시스템의 최적화)

  • Park, Keon-Jun;Lee, Bong-Yoon;Oh, Sung-Kwun
    • Proceedings of the KIEE Conference
    • /
    • 2004.11c
    • /
    • pp.343-345
    • /
    • 2004
  • We propose a new category of fuzzy set-based fuzzy inference systems based on data granulation related to fuzzy space division for each variables. Data granules are viewed as linked collections of objects(data, in particular) drawn together by the criteria of proximity, similarity, or functionality. Granulation of data with the aid of Hard C-Means(HCM) clustering algorithm help determine the initial parameters of fuzzy model such as the initial apexes of the membership functions and the initial values of polyminial functions being used in the premise and consequence part of the fuzzy rules. And the initial parameters are tuned effectively with the aid of the genetic algorithms(GAs) and the least square method. Numerical example is included to evaluate the performance of the proposed model.

  • PDF