• Title/Summary/Keyword: The Number of Router

Search Result 112, Processing Time 0.025 seconds

An Active Learning-based Method for Composing Training Document Set in Bayesian Text Classification Systems (베이지언 문서분류시스템을 위한 능동적 학습 기반의 학습문서집합 구성방법)

  • 김제욱;김한준;이상구
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.12
    • /
    • pp.966-978
    • /
    • 2002
  • There are two important problems in improving text classification systems based on machine learning approach. The first one, called "selection problem", is how to select a minimum number of informative documents from a given document collection. The second one, called "composition problem", is how to reorganize selected training documents so that they can fit an adopted learning method. The former problem is addressed in "active learning" algorithms, and the latter is discussed in "boosting" algorithms. This paper proposes a new learning method, called AdaBUS, which proactively solves the above problems in the context of Naive Bayes classification systems. The proposed method constructs more accurate classification hypothesis by increasing the valiance in "weak" hypotheses that determine the final classification hypothesis. Consequently, the proposed algorithm yields perturbation effect makes the boosting algorithm work properly. Through the empirical experiment using the Routers-21578 document collection, we show that the AdaBUS algorithm more significantly improves the Naive Bayes-based classification system than other conventional learning methodson system than other conventional learning methods

Index-based Searching on Timestamped Event Sequences (타임스탬프를 갖는 이벤트 시퀀스의 인덱스 기반 검색)

  • 박상현;원정임;윤지희;김상욱
    • Journal of KIISE:Databases
    • /
    • v.31 no.5
    • /
    • pp.468-478
    • /
    • 2004
  • It is essential in various application areas of data mining and bioinformatics to effectively retrieve the occurrences of interesting patterns from sequence databases. For example, let's consider a network event management system that records the types and timestamp values of events occurred in a specific network component(ex. router). The typical query to find out the temporal casual relationships among the network events is as fellows: 'Find all occurrences of CiscoDCDLinkUp that are fellowed by MLMStatusUP that are subsequently followed by TCPConnectionClose, under the constraint that the interval between the first two events is not larger than 20 seconds, and the interval between the first and third events is not larger than 40 secondsTCPConnectionClose. This paper proposes an indexing method that enables to efficiently answer such a query. Unlike the previous methods that rely on inefficient sequential scan methods or data structures not easily supported by DBMSs, the proposed method uses a multi-dimensional spatial index, which is proven to be efficient both in storage and search, to find the answers quickly without false dismissals. Given a sliding window W, the input to a multi-dimensional spatial index is a n-dimensional vector whose i-th element is the interval between the first event of W and the first occurrence of the event type Ei in W. Here, n is the number of event types that can be occurred in the system of interest. The problem of‘dimensionality curse’may happen when n is large. Therefore, we use the dimension selection or event type grouping to avoid this problem. The experimental results reveal that our proposed technique can be a few orders of magnitude faster than the sequential scan and ISO-Depth index methods.hods.