• Title/Summary/Keyword: query clustering

Search Result 120, Processing Time 0.035 seconds

Efficient Processing of Multidimensional Vessel USN Stream Data using Clustering Hash Table (클러스터링 해쉬 테이블을 이용한 다차원 선박 USN 스트림 데이터의 효율적인 처리)

  • Song, Byoung-Ho;Oh, Il-Whan;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.6
    • /
    • pp.137-145
    • /
    • 2010
  • Digital vessel have to accurate and efficient mange the digital data from various sensors in the digital vessel. But, In sensor network, it is difficult to transmit and analyze the entire stream data depending on limited networks, power and processor. Therefore it is suitable to use alternative stream data processing after classifying the continuous stream data. In this paper, We propose efficient processing method that arrange some sensors (temperature, humidity, lighting, voice) and process query based on sliding window for efficient input stream and pre-clustering using multiple Support Vector Machine(SVM) algorithm and manage hash table to summarized information. Processing performance improve as store and search and memory using hash table and usage reduced so maintain hash table in memory. We obtained to efficient result that accuracy rate and processing performance of proposal method using 35,912 data sets.

Semantic Clustering Model for Analytical Classification of Documents in Cloud Environment (클라우드 환경에서 문서의 유형 분류를 위한 시맨틱 클러스터링 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.389-397
    • /
    • 2017
  • Recently semantic web document is produced and added in repository in a cloud computing environment and requires an intelligent semantic agent for analytical classification of documents and information retrieval. The traditional methods of information retrieval uses keyword for query and delivers a document list returned by the search. Users carry a heavy workload for examination of contents because a former method of the information retrieval don't provide a lot of semantic similarity information. To solve these problems, we suggest a key word frequency and concept matching based semantic clustering model using hadoop and NoSQL to improve classification accuracy of the similarity. Implementation of our suggested technique in a cloud computing environment offers the ability to classify and discover similar document with improved accuracy of the classification. This suggested model is expected to be use in the semantic web retrieval system construction that can make it more flexible in retrieving proper document.

Multi-class Support Vector Machines Model Based Clustering for Hierarchical Document Categorization in Big Data Environment (빅 데이터 환경에서 계층적 문서 유형 분류를 위한 클러스터링 기반 다중 SVM 모델)

  • Kim, Young Soo;Lee, Byoung Yup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.600-608
    • /
    • 2017
  • Recently data growth rates are growing exponentially according to the rapid expansion of internet. Since users need some of all the information, they carry a heavy workload for examination and discovery of the necessary contents. Therefore information retrieval must provide hierarchical class information and the priority of examination through the evaluation of similarity on query and documents. In this paper we propose an Multi-class support vector machines model based clustering for hierarchical document categorization that make semantic search possible considering the word co-occurrence measures. A combination of hierarchical document categorization and SVM classifier gives high performance for analytical classification of web documents that increase exponentially according to extension of document hierarchy. More information retrieval systems are expected to use our proposed model in their developments and can perform a accurate and rapid information retrieval service.

Materialized View Selection Algorithm using Clustering Technique in Data Warehouse (데이터 웨어하우스에서 클러스터링 기법을 이용한 실체화 뷰 선택 알고리즘)

  • Yang, Jin-Hyuk;Chung, In-Jeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8
    • /
    • pp.2273-2286
    • /
    • 2000
  • In order to acquire the precise and fast response for an analytical query, proper selection of the views to materialize in data warehouse is very crucial. In traditional view selection algorithms, the whole relations are considered to be selected as materialized views. However, materializing the whole relations rather than a part of relations results in much worse performance in terms of time and space cost. Therefore, we present an improved algorithm for selection of views to materialize using clustering method to overcome the problem resulted from conventional view selection algorithms. In the presented algorithm, ASVMRT(Algorithm for Selection of Views to daterialize using Iteduced Table). we first generate reduced tables in clata warehouse using automatic clustering based on attrihute-values density, then we consider the combination of reduced tables as materialized views instead of the combination of the original hase relations. For the justification of the proposecl algorithm. we show the experimental results in which both time and space cost are approximately 1.8 times better than the conventional algorithms.

  • PDF

Relevance Feedback Method of an Extended Boolean Model using Hierarchical Clustering Techniques (계층적 클러스터링 기법을 이용한 확장 불리언 모델의 적합성 피드백 방법)

  • 최종필;김민구
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1374-1385
    • /
    • 2004
  • The relevance feedback process uses information obtained from a user about an initially retrieved set of documents to improve subsequent search formulations and retrieval performance. In the extended Boolean model, the relevance feedback Implies not only that new query terms must be identified, but also that the terms must be connected with the Boolean AND/OR operators properly Salton et al. proposed a relevance feedback method for the extended Boolean model, called the DNF (disjunctive normal form) method. However, this method has a critical problem in generating a reformulated queries. In this study, we investigate the problem of the DNF method and propose a relevance feedback method using hierarchical clustering techniques to solve the problem. We show the results of experiments which are performed on two data sets: the DOE collection in TREC 1 and the Web TREC 10 collection.

Term Clustering and Duplicate Distribution for Efficient Parallel Information Retrieval (효율적인 병렬정보검색을 위한 색인어 군집화 및 분산저장 기법)

  • 강재호;양재완;정성원;류광렬;권혁철;정상화
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.1_2
    • /
    • pp.129-139
    • /
    • 2003
  • The PC cluster architecture is considered as a cost-effective alternative to the existing supercomputers for realizing a high-performance information retrieval (IR) system. To implement an efficient IR system on a PC cluster, it is essential to achieve maximum parallelism by having the data appropriately distributed to the local hard disks of the PCs in such a way that the disk I/O and the subsequent computation are distributed as evenly as possible to all the PCs. If the terms in the inverted index file can be classified to closely related clusters, the parallelism can be maximized by distributing them to the PCs in an interleaved manner. One of the goals of this research is the development of methods for automatically clustering the terms based on the likelihood of the terms' co-occurrence in the same query. Also, in this paper, we propose a method for duplicate distribution of inverted index records among the PCs to achieve fault-tolerance as well as dynamic load balancing. Experiments with a large corpus revealed the efficiency and effectiveness of our method.

Materialized View Selection Algorithm using Clustering Technique in Data Warehouse (데이터 웨어하우스에서 클러스터링 기법을 이용한 실체화 뷰 선택 알고리즘)

  • Yang, Jin-Hyuk;Chung, In-Jeong
    • Annual Conference of KIPS
    • /
    • 2000.04a
    • /
    • pp.28-35
    • /
    • 2000
  • In order to acquire the precise and fast response for an analytical query, proper selection of the views to materialize in data warehouse is very crucial. In traditional algorithms, the whole relation is considered to be selected as materialized views. However, materializing the whole relation rather than a part of relation results in much worse performance in terms of time and space cost. Therefore, we present a new algorithm for selection of views to materialize using clustering method in order to improve the performance of data warehouse including this problem. In the presented algorithm, ASVMR(Algorithm for Selection of Views to Materialize using Reduced table), we first generate reduced tables in data warehouse using automatic clustering based on attribute-values density, then we consider the combination of reduced tables as materialized views instead of the combination of the original base relations. We also show the experimental results in which both time and space cost are approximately 1.8 times better than the conventional algorithms.

  • PDF

Factors Clustering Approach to Parametric Cost Estimates And OLAP Driver

  • JaeHo, Cho;BoSik, Son;JaeYoul, Chun
    • International conference on construction engineering and project management
    • /
    • 2009.05a
    • /
    • pp.707-716
    • /
    • 2009
  • The role of cost modeller is to facilitate the design process by systematic application of cost factors so as to maintain a sensible and economic relationship between cost, quantity, utility and appearance which thus helps in achieving the client's requirements within an agreed budget. There are a number of research on cost estimates in the early design stage based on the improvement of accuracy or impact factors. It is common knowledge that cost estimates are undertaken progressively throughout the design stage and make use of the information that is available at each phase, through the related research up to now. In addition, Cost estimates in the early design stage shall analyze the information under the various kinds of precondition before reaching the more developed design because a design can be modified and changed in all process depending on clients' requirements. Parametric cost estimating models have been adopted to support decision making in a changeable environment, in the early design stage. These models are using a similar instance or a pattern of historical case to be constituted in project information, geographic design features, relevant data to quantity or cost, etc. OLAP technique analyzes a subject data by multi-dimensional points of view; it supports query, analysis, comparison of required information by diverse queries. OLAP's data structure matches well with multiview-analysis framework. Accordingly, this study implements multi-dimensional information system for case based quantity data related to design information that is utilizing OLAP's technology, and then analyzes impact factors of quantity by the design criteria or parameter of the same meaning. On the basis of given factors examined above, this study will generate the rules on quantity measure and produce resemblance class using clustering of data mining. These sorts of knowledge-base consist of a set of classified data as group patterns, of which will be appropriate stand on the parametric cost estimating method.

  • PDF

Development of Personalized Recommendation System using RFM method and k-means Clustering (RFM기법과 k-means 기법을 이용한 개인화 추천시스템의 개발)

  • Cho, Young-Sung;Gu, Mi-Sug;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.6
    • /
    • pp.163-172
    • /
    • 2012
  • Collaborative filtering which is used explicit method in a existing recommedation system, can not only reflect exact attributes of item but also still has the problem of sparsity and scalability, though it has been practically used to improve these defects. This paper proposes the personalized recommendation system using RFM method and k-means clustering in u-commerce which is required by real time accessablity and agility. In this paper, using a implicit method which is is not used complicated query processing of the request and the response for rating, it is necessary for us to keep the analysis of RFM method and k-means clustering to be able to reflect attributes of the item in order to find the items with high purchasablity. The proposed makes the task of clustering to apply the variable of featured vector for the customer's information and calculating of the preference by each item category based on purchase history data, is able to recommend the items with efficiency. To estimate the performance, the proposed system is compared with existing system. As a result, it can be improved and evaluated according to the criteria of logicality through the experiment with dataset, collected in a cosmetic internet shopping mall.

Efficient-Clustering using the Dynamic Sky line Query in Sensor Network Environment (센서 네트워크 환경에서 동적 스카이라인 질의를 이용한 효율적인 클러스터링)

  • Jo, Yeong-Bok;Lee, Sang-Ho
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.287-291
    • /
    • 2007
  • 기존 센서네트워크 환경의 노드들이 모바일 환경으로 바뀌면서 클러스터를 구축하고 클러스터 헤더를 선정함에 있어 기존 방법은 정적 노드를 대상으로 구축되어 있기 때문에 이를 동적 노드에 적합한 방법으로 구축하기 위해 기존 연속적인 스카이라인 질의방법을 이용하여 클러스터를 구축하고 클러스터헤더를 선정함으로 센서네트워크의 효율적인 환경을 구축하고자 한다. 기존은 클러스터 헤드 선정을 클러스터를 구축하고 구축된 클러스터 내에서 에너지 잔여량을 비교 하여 가장 에너지가 많은 노드를 헤드로 선정하여 라우팅을 고려하는 기법을 사용하였다. 그러나 센서 노드가 모바일 노드일 경우 위치도 함께 고려되어야 할 속성 중 하나일 것이다. 따라서 이 논문에서는 클러스터 헤더 선정기법에서 기존 방식과 달리 클러스터 헤더를 선정하고 클러스터 헤더를 선정하고 클러스터 헤더를 기준으로 R hop 까지를 하나의 클러스터로 설정하는 효율적인 영역 결정 기법을 제안하였다.

  • PDF