• Title/Summary/Keyword: 베이스

Search Result 4,382, Processing Time 0.027 seconds

Committee Learning Classifier based on Attribute Value Frequency (속성 값 빈도 기반의 전문가 다수결 분류기)

  • Lee, Chang-Hwan;Jung, In-Chul;Kwon, Young-S.
    • Journal of KIISE:Databases
    • /
    • v.37 no.4
    • /
    • pp.177-184
    • /
    • 2010
  • In these day, many data including sensor, delivery, credit and stock data are generated continuously in massive quantity. It is difficult to learn from these data because they are large in volume and changing fast in their concepts. To handle these problems, learning methods based in sliding window methods over time have been used. But these approaches have a problem of rebuilding models every time new data arrive, which requires a lot of time and cost. Therefore we need very simple incremental learning methods. Bayesian method is an example of these methods but it has a disadvantage which it requries the prior knowledge(probabiltiy) of data. In this study, we propose a learning method based on attribute values. In the proposed method, even though we don't know the prior knowledge(probability) of data, we can apply our new method to data. The main concept of this method is that each attribute value is regarded as an expert learner, summing up the expert learners lead to better results. Experimental results show our learning method learns from data very fast and performs well when compared to current learning methods(decision tree and bayesian).

Relevance Feedback using Region-of-interest in Retrieval of Satellite Images (위성영상 검색에서 사용자 관심영역을 이용한 적합성 피드백)

  • Kim, Sung-Jin;Chung, Chin-Wan;Lee, Seok-Lyong;Kim, Deok-Hwan
    • Journal of KIISE:Databases
    • /
    • v.36 no.6
    • /
    • pp.434-445
    • /
    • 2009
  • Content-based image retrieval(CBIR) is the retrieval technique which uses the contents of images. However, in contrast to text data, multimedia data are ambiguous and there is a big difference between system's low-level representation and human's high-level concept. So it doesn't always mean that near points in the vector space are similar to user. We call this the semantic-gap problem. Due to this problem, performance of image retrieval is not good. To solve this problem, the relevance feedback(RF) which uses user's feedback information is used. But existing RF doesn't consider user's region-of-interest(ROI), and therefore, irrelevant regions are used in computing new query points. Because the system doesn't know user's ROI, RF is proceeded in the image-level. We propose a new ROI RF method which guides a user to select ROI from relevant images for the retrieval of complex satellite image, and this improves the accuracy of the image retrieval by computing more accurate query points in this paper. Also we propose a pruning technique which improves the accuracy of the image retrieval by using images not selected by the user in this paper. Experiments show the efficiency of the proposed ROI RF and the pruning technique.

A Filtering Technique of Streaming XML Data based Postfix Sharing for Partial matching Path Queries (부분매칭 경로질의를 위한 포스트픽스 공유에 기반한 스트리밍 XML 데이타 필터링 기법)

  • Park Seog;Kim Young-Soo
    • Journal of KIISE:Databases
    • /
    • v.33 no.1
    • /
    • pp.138-149
    • /
    • 2006
  • As the environment with sensor network and ubiquitous computing is emerged, there are many demands of handling continuous, fast data such as streaming data. As work about streaming data has begun, work about management of streaming data in Publish-Subscribe system is started. The recent emergence of XML as a standard for information exchange on Internet has led to more interest in Publish - Subscribe system. A filtering technique of streaming XML data in the existing Publish- Subscribe system is using some schemes based on automata and YFilter, which is one of filtering techniques, is very popular. YFilter exploits commonality among path queries by sharing the common prefixes of the paths so that they are processed at most one and that is using the top-down approach. However, because partial matching path queries interrupt the common prefix sharing and don't calculate from root, throughput of YFilter decreases. So we use sharing of commonality among path queries with the common postfixes of the paths and use the bottom-up approach instead of the top-down approach. This filtering technique is called as PoSFilter. And we verify this technique through comparing with YFilter about throughput.

Data Allocation for Multiple Broadcast Channels (다중 방송채널을 위한 데이타 할당)

  • Jung Sungwon;Nam Seunghoon;Jeong Horyun;Lee Wontaek
    • Journal of KIISE:Databases
    • /
    • v.33 no.1
    • /
    • pp.86-101
    • /
    • 2006
  • The bandwidth of channel and the power of the mobile devices are limited on a wireless environment. In this case, data broadcast has become an excellent technique for efficient data dissemination. A significant amount of researches have been done on generating an efficient broadcast program of a set of data items with different access frequencies over multiple wireless broadcast channels as well as single wireless broadcast channel. In this paper, an efficient data allocation method over multiple wireless broadcasting channels is explored. In the traditional approaches, a set of data items are partitioned into a number of channel based on their access probabilities. However, these approaches ignore a variation of access probabilities of data items allocated in each channel. In practice, it is difficult to have many broadcast channels and thus each channel need to broadcast many data items. Therefore, if a set of data items broadcast in each channel have different repetition frequencies based on their access frequencies, it will give much better performance than the traditional approaches. In this paper, we propose an adaptive data allocation technique based on data access probabilities over multiple broadcast channels. Our proposed technique allows the adaptation of repetition frequency of each data item within each channel by taking its access probabilities into at count.

An Energy-Efficient Concurrency Control Method for Mobile Transactions with Skewed Data Access Patterns in Wireless Broadcast Environments (무선 브로드캐스트 환경에서 편향된 엑세스 패턴을 가진 모바일 트랜잭션을 위한 효과적인 동시성 제어 기법)

  • Jung, Sung-Won;Park, Sung-Geun;Choi, Keun-Ha
    • Journal of KIISE:Databases
    • /
    • v.33 no.1
    • /
    • pp.69-85
    • /
    • 2006
  • Broadcast has been often used to disseminate the frequently requested data efficiently to a large volume of mobile clients over a single or multiple channels. Conventional concurrency control protocols for mobile transactions are not suitable for the wireless broadcast environments due to the limited bandwidth of the up-link communication channel. In wireless broadcast environments, the server often broadcast different data items with different frequency to incorporate the data access patterns of mobile transactions. The previously proposed concurrency control protocols for mobile transactions in wireless broadcast environments are focused on the mobile transactions with uniform data access patterns. However, these protocols perform poorly when the data access pattern of update mobile transaction are not uniform but skewed. The update mobile transactions with skewed data access patterns will be frequently aborted and restarted due 4o the update conflict of the same data items with a high access frequency. In this paper, we propose an energy-efficient concurrence control protocol for mobile transactions with skewed data access as well as uniform data access patterns. Our protocol use a random back-off technique to avoid the frequent abort and restart of update mobile transactions. We present in-depth experimental analysis of our method by comparing it with existing concurrency control protocols. Our performance analysis show that it significantly decrease the average response time, the amount of upstream and downstream bandwidth usage over existing protocols.

n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure (n-gram/2L: 공간 및 시간 효율적인 2단계 n-gram 역색인 구조)

  • Kim Min-Soo;Whang Kyu-Young;Lee Jae-Gil;Lee Min-Jae
    • Journal of KIISE:Databases
    • /
    • v.33 no.1
    • /
    • pp.12-31
    • /
    • 2006
  • The n-gram inverted index has two major advantages: language-neutral and error-tolerant. Due to these advantages, it has been widely used in information retrieval or in similar sequence matching for DNA and Protein databases. Nevertheless, the n-gram inverted index also has drawbacks: the size tends to be very large, and the performance of queries tends to be bad. In this paper, we propose the two-level n-gram inverted index (simply, the n-gram/2L index) that significantly reduces the size and improves the query performance while preserving the advantages of the n-gram inverted index. The proposed index eliminates the redundancy of the position information that exists in the n-gram inverted index. The proposed index is constructed in two steps: 1) extracting subsequences of length m from documents and 2) extracting n-grams from those subsequences. We formally prove that this two-step construction is identical to the relational normalization process that removes the redundancy caused by a non-trivial multivalued dependency. The n-gram/2L index has excellent properties: 1) it significantly reduces the size and improves the Performance compared with the n-gram inverted index with these improvements becoming more marked as the database size gets larger; 2) the query processing time increases only very slightly as the query length gets longer. Experimental results using databases of 1 GBytes show that the size of the n-gram/2L index is reduced by up to 1.9${\~}$2.7 times and, at the same time, the query performance is improved by up to 13.1 times compared with those of the n-gram inverted index.

Comparative Analysis of KoMCI 2004 and KCI 2004 Impact Factors (KoMCI(Korean Medical Citation Index)와 KCI(Korea Citation Index)의 2004년도 영향력지표값 비교분석)

  • Sun, Huh;Lee, Choon-Shil
    • Journal of Information Management
    • /
    • v.36 no.3
    • /
    • pp.183-193
    • /
    • 2005
  • Korean Academy of Medical Sciences began developing Korean Medical Citaton Index(KoMCI) database in 2002, and has announced the impact factors of Korean medical journals published since 2000. In July 2005, Korea Research Foundation also announced the KCI impact factor of journals covering all subject areas for the 2003 and 2004. We compared the impact factor(IF), impact factor excluding self-citation(ZIF), and self-citation impact factor(SIF) of KoMCI 2004 and KCI 2004 in order to disclose why there is such a great difference in the values of impact factors between two databases. Out of 72 medical journals in both database, 59 journals were compared after excluding the missing data in KCI. Mean IF of KoMCI 2004 was 0.2 and that of KCI 2004 was 0.03(p=0.0000). Mean ZIF of KoMCI was 0.06 and that of KCI was 0.01(p=0.000). Mean SIF of KoMCI was 0.139 and that of KCI was 0.02(p=0.0000). We presumed that the major difference in the impact factor values was originated from the fact that KCI does not control the authority of journal names cited in the references. We strongly recommend that it is necessary to control the authority especially if Korea Research Foundation wants to ensure the validity and reliability of KCI data in the evaluation of korean journals.

Dependency-based Framework of Combining Multiple Experts for Recognizing Unconstrained Handwritten Numerals (무제약 필기 숫자를 인식하기 위한 다수 인식기를 결합하는 의존관계 기반의 프레임워크)

  • Kang, Hee-Joong;Lee, Seong-Whan
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.8
    • /
    • pp.855-863
    • /
    • 2000
  • Although Behavior-Knowledge Space (BKS) method, one of well known decision combination methods, does not need any assumptions in combining the multiple experts, it should theoretically build exponential storage spaces for storing and managing jointly observed K decisions from K experts. That is, combining K experts needs a (K+1)st-order probability distribution. However, it is well known that the distribution becomes unmanageable in storing and estimating, even for a small K. In order to overcome such weakness, it has been studied to decompose a probability distribution into a number of component distributions and to approximate the distribution with a product of the component distributions. One of such previous works is to apply a conditional independence assumption to the distribution. Another work is to approximate the distribution with a product of only first-order tree dependencies or second-order distributions as shown in [1]. In this paper, higher order dependency than the first-order is considered in approximating the distribution and a dependency-based framework is proposed to optimally approximate the (K+1)st-order probability distribution with a product set of dth-order dependencies where ($1{\le}d{\le}K$), and to combine multiple experts based on the product set using the Bayesian formalism. This framework was experimented and evaluated with a standardized CENPARMI data base.

  • PDF

Online Signature Verification by Visualization of Dynamic Characteristics using New Pattern Transform Technique (동적 특성의 시각화를 수행하는 새로운 패턴변환 기법에 의한 온라인 서명인식 기술)

  • Chi Suyoung;Lee Jaeyeon;Oh Weongeun;Kim Changhun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.7
    • /
    • pp.663-673
    • /
    • 2005
  • An analysis model for the dynamics information of two-dimensional time-series patterns is described. In the proposed model, two novel transforms that visualize the dynamic characteristics are proposed. The first transform, referred to as speed equalization, reproduces a time-series pattern assuming a constant linear velocity to effectively model the temporal characteristics of the signing process. The second transform, referred to as velocity transform, maps the signal onto a horizontal vs. vertical velocity plane where the variation oi the velocities over time is represented as a visible shape. With the transforms, the dynamic characteristics in the original signing process are reflected in the shape of the transformed patterns. An analysis in the context of these shapes then naturally results in an effective analysis of the dynamic characteristics. The proposed transform technique is applied to an online signature verification problem for evaluation. Experimenting on a large signature database, the performance evaluated in EER(Equal Error Rate) was improved to 1.17$\%$ compared to 1.93$\%$ of the traditional signature verification algorithm in which no transformed patterns are utilized. In the case of skilled forgery experiments, the improvement was more outstanding; it was demonstrated that the parameter set extracted from the transformed patterns was more discriminative in rejecting forgeries

Classification of the Architectures of Web based Expert Systems (웹기반 전문가시스템의 구조 분류)

  • Lim, Gyoo-Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.13 no.4
    • /
    • pp.1-16
    • /
    • 2007
  • According to the expansion of the Internet use and the utilization of e-business, there are an increasing number of studies of intelligent-based systems for the preparation of ubiquitous environment. In addition, expert systems have been developed from Stand Alone types to web-based Client-Server types, which are now used in various Internet environments. In this paper, we investigated the environment of development for web-based expert systems, we classified and analyzed them according to type, and suggested general typical models of web-based expert systems and their architectures. We classified the web-based expert systems with two perspectives. First, we classified them into the Server Oriented model and Client Oriented model based on the Load Balancing aspect between client and server. Second, based on the degree of knowledge and inference-sharing, we classified them into the No Sharing model, Server Sharing model, Client Sharing model and Client-Server Sharing model. By combining them we derived eight types of web-based expert systems. We also analyzed the location problems of Knowledge Bases, Fact Bases, and Inference Engines on the Internet, and analyzed the pros & cons, the technologies, the considerations, and the service types for each model. With the framework proposed from this study, we can develop more efficient expert systems in future environments.

  • PDF