• 제목/요약/키워드: Search-Result Clustering

검색결과 75건 처리시간 0.032초

Intelligent Multi-Agent Distributed Platform based on Dynamic Object Group Management using Fk-means (Fk means를 이용한 동적객체그룹관리기반 지능형 멀티 에이전트 분산플랫폼)

  • Lee, Jae-wan;Na, Hye-Young;Mateo, Romeo Mark A.
    • Journal of Internet Computing and Services
    • /
    • 제10권1호
    • /
    • pp.101-110
    • /
    • 2009
  • Multi-agent systems are mostly used to integrate the intelligent and distributed approaches to various systems for effective sharing of resources and dynamic system reconfigurations. Object replication is usually used to implement fault tolerance and solve the problem of unexpected failures to the system. This paper presents the intelligent multi-agent distributed platform based on the dynamic object group management and proposes an object search technique based on the proposed filtered k-means (Fk-means). We propose Fk-means for the search mechanism to find alternative objects in the event of object failures and transparently reconnect client to the object. The filtering range of Fk-means value is set only to include relevant objects within the group to perform the search method efficiently. The simulation result shows that the proposed mechanism provides fast and accurate search for the distributed object groups.

  • PDF

Efficient Processing of Multidimensional Vessel USN Stream Data using Clustering Hash Table (클러스터링 해쉬 테이블을 이용한 다차원 선박 USN 스트림 데이터의 효율적인 처리)

  • Song, Byoung-Ho;Oh, Il-Whan;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • 제47권6호
    • /
    • pp.137-145
    • /
    • 2010
  • Digital vessel have to accurate and efficient mange the digital data from various sensors in the digital vessel. But, In sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. In this paper, We propose efficient processing method that arrange some sensors (temperature, humidity, lighting, voice) and process query based on sliding window for efficient input stream and pre-clustering using multiple Support Vector Machine(SVM) algorithm and manage hash table to summarized information. Processing performance improve as store and search and memory using hash table and usage reduced so maintain hash table in memory. We obtained to efficient result that accuracy rate and processing performance of proposal method using 35,912 data sets.

A Sequential Indexing Method for Multidimensional Range Queries (다차원 범위 질의를 위한 순차 색인 기법)

  • Cha Guang-Ho
    • Journal of KIISE:Databases
    • /
    • 제32권3호
    • /
    • pp.254-262
    • /
    • 2005
  • This paper presents a new sequential indexing method called segment-page indexing (SP-indexing) for multidimensional range queries. The design objectives of SP-indexing are twofold:(1) improving the range query performance of multidimensional indexing methods (MIMs) and (2) providing a compromise between optimal index clustering and the full index reorganization overhead. Although more than ten years of database research has resulted in a great variety of MIMs, most efforts have focused on data-level clustering and there has been less attempt to cluster indexes. As a result, most relevant index nodes are widely scattered on a disk and many random disk accesses are required during the search. SP-indexing avoids such scattering by storing the relevant nodes contiguously in a segment that contains a sequence of contiguous disk pages and improves performance by offering sequential access within a segment. Experimental results demonstrate that SP-indexing improves query performance up to several times compared with traditional MIMs using small disk pages with respect to total elapsed time and it reduces waste of disk bandwidth due to the use of simple large pages.

Product Recommendation System on VLDB using k-means Clustering and Sequential Pattern Technique (k-means 클러스터링과 순차 패턴 기법을 이용한 VLDB 기반의 상품 추천시스템)

  • Shim, Jang-Sup;Woo, Seon-Mi;Lee, Dong-Ha;Kim, Yong-Sung;Chung, Soon-Key
    • The KIPS Transactions:PartD
    • /
    • 제13D권7호
    • /
    • pp.1027-1038
    • /
    • 2006
  • There are many technical problems in the recommendation system based on very large database(VLDB). So, it is necessary to study the recommendation system' structure and the data-mining technique suitable for the large scale Internet shopping mail. Thus we design and implement the product recommendation system using k-means clustering algorithm and sequential pattern technique which can be used in large scale Internet shopping mall. This paper processes user information by batch processing, defines the various categories by hierarchical structure, and uses a sequential pattern mining technique for the search engine. For predictive modeling and experiment, we use the real data(user's interest and preference of given category) extracted from log file of the major Internet shopping mall in Korea during 30 days. And we define PRP(Predictive Recommend Precision), PRR(Predictive Recommend Recall), and PF1(Predictive Factor One-measure) for evaluation. In the result of experiments, the best recommendation time and the best learning time of our system are much as O(N) and the values of measures are very excellent.

A Study on the Clustering Technique Associated with Statistical Term Relatedness in Information Retrieval (정보검색(情報檢索)에 있어서 용어(用語)의 통계적(統計的) 관련성(關聯性)을 응용(應用)한 클러스터링기법(技法))

  • Jeong, Jun-Min
    • Journal of Information Management
    • /
    • 제18권4호
    • /
    • pp.98-117
    • /
    • 1985
  • At the present time, the role and importance of information retrieval has greatly increased for two main reasons: the coverage of the searchable collections is now extensive and collection size may exceed several million documents; further more, the search results can now be obtained more or less instantaneously using online procedures and computer terminal devices that provide interaction and communication between system and users. The large collection size make it plausible to the users that relevant information will in fact be retrieved as a result of a search operation, and the probability of obtaining the search output without delay creates a substantial user demand for the retrieval services.

  • PDF

Evaluation of DES key search stability using Parallel Computing (병렬 컴퓨팅을 이용한 DES 키 탐색 안정성 분석)

  • Yoon, JunWeon;Choi, JangWon;Park, ChanYeol;Kong, Ki-Sik
    • Journal of Digital Contents Society
    • /
    • 제14권1호
    • /
    • pp.65-72
    • /
    • 2013
  • Current and future parallel computing model has been suggested for running and solving large-scale application problems such as climate, bio, cryptology, and astronomy, etc. Parallel computing is a form of computation in which many calculations are carried out simultaneously. And we are able to shorten the execution time of the program, as well as can extend the scale of the problem that can be solved. In this paper, we perform the actual cryptographic algorithms through parallel processing and evaluate its efficiency. Length of the key, which is stable criterion of cryptographic algorithm, judged according to the amount of complete enumeration computation. So we present a detailed procedure of DES key search cryptographic algorithms for executing of enumeration computation in parallel processing environment. And then, we did the simulation through applying to clustering system. As a result, we can measure the safety and solidity of cryptographic algorithm.

Efficient Illegal Contents Detection and Attacker Profiling in Real Environments

  • Kim, Jin-gang;Lim, Sueng-bum;Lee, Tae-jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권6호
    • /
    • pp.2115-2130
    • /
    • 2022
  • With the development of over-the-top (OTT) services, the demand for content is increasing, and you can easily and conveniently acquire various content in the online environment. As a result, copyrighted content can be easily copied and distributed, resulting in serious copyright infringement. Some special forms of online service providers (OSP) use filtering-based technologies to protect copyrights, but illegal uploaders use methods that bypass traditional filters. Uploading with a title that bypasses the filter cannot use a similar search method to detect illegal content. In this paper, we propose a technique for profiling the Heavy Uploader by normalizing the bypassed content title and efficiently detecting illegal content. First, the word is extracted from the normalized title and converted into a bit-array to detect illegal works. This Bloom Filter method has a characteristic that there are false positives but no false negatives. The false positive rate has a trade-off relationship with processing performance. As the false positive rate increases, the processing performance increases, and when the false positive rate decreases, the processing performance increases. We increased the detection rate by directly comparing the word to the result of increasing the false positive rate of the Bloom Filter. The processing time was also as fast as when the false positive rate was increased. Afterwards, we create a function that includes information about overall piracy and identify clustering-based heavy uploaders. Analyze the behavior of heavy uploaders to find the first uploader and detect the source site.

A Clustering Technique to Minimize Energy Consumption of Sensor networks by using Enhanced Genetic Algorithm (진보된 유전자 알고리즘 이용하여 센서 네트워크의 에너지 소모를 최소화하는 클러스터링 기법)

  • Seo, Hyun-Sik;Oh, Se-Jin;Lee, Chae-Woo
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • 제46권2호
    • /
    • pp.27-37
    • /
    • 2009
  • Sensor nodes forming a sensor network have limited energy capacity such as small batteries and when these nodes are placed in a specific field, it is important to research minimizing sensor nodes' energy consumption because of difficulty in supplying additional energy for the sensor nodes. Clustering has been in the limelight as one of efficient techniques to reduce sensor nodes' energy consumption in sensor networks. However, energy saving results can vary greatly depending on election of cluster heads, the number and size of clusters and the distance among the sensor nodes. /This research has an aim to find the optimal set of clusters which can reduce sensor nodes' energy consumption. We use a Genetic Algorithm(GA), a stochastic search technique used in computing, to find optimal solutions. GA performs searching through evolution processes to find optimal clusters in terms of energy efficiency. Our results show that GA is more efficient than LEACH which is a clustering algorithm without evolution processes. The two-dimensional GA (2D-GA) proposed in this research can perform more efficient gene evolution than one-dimensional GA(1D-GA)by giving unique location information to each node existing in chromosomes. As a result, the 2D-GA can find rapidly and effectively optimal clusters to maximize lifetime of the sensor networks.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • 제25권1호
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Analysis and Visualization for Comment Messages of Internet Posts (인터넷 게시물의 댓글 분석 및 시각화)

  • Lee, Yun-Jung;Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • 제9권7호
    • /
    • pp.45-56
    • /
    • 2009
  • There are many internet users who collect the public opinions and express their opinions for internet news or blog articles through the replying comment on online community. But, it is hard to search and explore useful messages on web blogs since most of web blog systems show articles and their comments to the form of sequential list. Also, spam and malicious comments have become social problems as the internet users increase. In this paper, we propose a clustering and visualizing system for responding comments on large-scale weblogs, namely 'Daum AGORA,' using similarity analysis. Our system shows the comment clustering result as a simple screen view. Our system also detects spam comments using Needleman-Wunsch algorithm that is a well-known algorithm in bioinformatics.