• 제목/요약/키워드: Indexing Model

검색결과 169건 처리시간 0.019초

자동색인의 통계적기법과 한국어 문헌의 실험 (Statistical Techniques for Automatic Indexing and Some Experiments with Korean Documents)

  • 정영미;이태영
    • 한국문헌정보학회지
    • /
    • 제9권
    • /
    • pp.99-118
    • /
    • 1982
  • This paper first reviews various techniques proposed for automatic indexing with special emphasis placed on statistical techniques. Frequency-based statistical techniques are categorized into the following three approaches for further investigation on the basis of index term selection criteria: term frequency approach, document frequency approach, and probabilistic approach. In the experimental part of this study, Pao's technique based on the Goffman's transition region formula and Harter's 2-Poisson distribution model with a measure of the potential effectiveness of index term were tested. Experimental document collection consists of 30 agriculture-related documents written in Korean. Pao's technique did not yield good result presumably due to the difference in word usage between Korean and English. However, Harter's model holds some promise for Korean document indexing because the evaluation result from this experiment was similar to that of the Harter's.

  • PDF

A PROPOSAL OF SEMI-AUTOMATIC INDEXING ALGORITHM FOR MULTI-MEDIA DATABASE WITH USERS' SENSIBILITY

  • Mitsuishi, Takashi;Sasaki, Jun;Funyu, Yutaka
    • 한국감성과학회:학술대회논문집
    • /
    • 한국감성과학회 2000년도 춘계 학술대회 및 국제 감성공학 심포지움 논문집 Proceeding of the 2000 Spring Conference of KOSES and International Sensibility Ergonomics Symposium
    • /
    • pp.120-125
    • /
    • 2000
  • We propose a semi-automatic and dynamic indexing algorithm for multi-media database(e.g. movie files, audio files), which are difficult to create indexes expressing their emotional or abstract contents, according to user's sensitivity by using user's histories of access to database. In this algorithm, we simply categorize data at first, create a vector space of each user's interest(user model) from the history of which categories the data belong to, and create vector space of each data(title model) from the history of which users the data had been accessed from. By continuing the above method, we could create suitable indexes, which show emotional content of each data. In this paper, we define the recurrence formulas based on the proposed algorithm. We also show the effectiveness of the algorithm by simulation result.

  • PDF

볼스크류 전구간 피치오차 측정시스템 (Precision Measurement System forBall Screw Pitch Error)

  • 박희재;김인기
    • 한국정밀공학회:학술대회논문집
    • /
    • 한국정밀공학회 1993년도 추계학술대회 논문집
    • /
    • pp.279-285
    • /
    • 1993
  • This paper presents a precision automatic measuring system for ball screw Pitch. Ball screw is mounted on a precision indexing table, and the ball screw pitch is measured via magnetic scale, where the indexing and measurement are performed by a PC. For precision indexing of ball screw, direct driven motor is coupled to the designed dead and live centers; the performance of the centers are assessed with a precision master sylinder,such as radial motion,tilt motion, and axial motions. An error compensation model is constructed for the measurement system of ball screw pitch, where the error motions of indexing system as well as the scale measurement system are combined to give the measurement error for the ball screw. The developed system proposes an automated precision measurement system for manufacturers and users of ball screw.

  • PDF

Mathematical Performance Model of Two-Tier Indexing Scheme in Wireless Data Broadcasting

  • Im, Seokjin
    • International Journal of Advanced Culture Technology
    • /
    • 제6권4호
    • /
    • pp.65-70
    • /
    • 2018
  • Wireless data broadcasting system that can support any number of clients is the effective alternative for the challenge of scalability in ubiquitous computing in IoT environment. In the system, it is important to evaluate quickly the performance parameter, the access time that means how quickly the client access desired data items. In this paper, we derive the mathematical model for the access time in the wireless data broadcast system adopting two-tier indexing scheme. The derived model enables to evaluate the access time without the complicated simulation. In order to evaluate the model, we compare the access time by the model with the access time by the simulation.

이동체 데이터베이스를 위한 색인 구조의 비용모델 (Cost Model of Index Structures for Moving Objects Databases)

  • 전봉기
    • 한국정보통신학회논문지
    • /
    • 제11권3호
    • /
    • pp.523-531
    • /
    • 2007
  • 본 논문에서는 이동체들을 관리하기에 적합한 새로운 색인 기법을 개발하고, 이 기법의 비용 모델을 제안한다. 또한 삽입/삭제 비용이 적은 동적 해싱 색인을 제안한다. 동적 해싱 색인 구조는 해쉬와 트리를 결합한 동적 해싱 기술을 공간 색인에 적용한 것이다. 본 논문에서는 이동체의 빈번한 위치 변경에 대한 비용 모델과 동적 색인 구조를 분석하였고, 성능 평가 실험을 통하여 검증하였다. 실험 결과에서 새로이 제안하는 색인 기법(동적 해싱 색인)은 R-tree와 고정 그리드 보다 성능이 우수하였다.

Speaker Tracking Using Eigendecomposition and an Index Tree of Reference Models

  • Moattar, Mohammad Hossein;Homayounpour, Mohammad Mehdi
    • ETRI Journal
    • /
    • 제33권5호
    • /
    • pp.741-751
    • /
    • 2011
  • This paper focuses on online speaker tracking for telephone conversations and broadcast news. Since the online applicability imposes some limitations on the tracking strategy, such as data insufficiency, a reliable approach should be applied to compensate for this shortage. In this framework, a set of reference speaker models are used as side information to facilitate online tracking. To improve the indexing accuracy, adaptation approaches in eigenvoice decomposition space are proposed in this paper. We believe that the eigenvoice adaptation techniques would help to embed the speaker space in the models and hence enrich the generality of the selected speaker models. Also, an index structure of the reference models is proposed to speed up the search in the model space. The proposed framework is evaluated on 2002 Rich Transcription Broadcast News and Conversational Telephone Speech corpus as well as a synthetic dataset. The indexing errors of the proposed framework on telephone conversations, broadcast news, and synthetic dataset are 8.77%, 9.36%, and 12.4%, respectively. Using the index tree structure approach, the run time of the proposed framework is improved by 22%.

내포 질의의 효율적 평가를 위한 분리 색인 기법 (A Separated Indexing Technique for Efficient Evaluation of Nested Queries)

  • 권영무;박용진
    • 전자공학회논문지B
    • /
    • 제29B권7호
    • /
    • pp.11-22
    • /
    • 1992
  • In this paper, a new indexing technique is proposed for efficient evaluation of nested queries on aggregation hierarchy in object-oriented data model. As an index data structure, an extended $B^{+}$ tree is introduced in which instance identifier to be searched and path information used for update of index record are stored in leaf node and subleaf node, respectively. the retrieval and update algorithm on the introduced index data structure is provided. Comparisons under a variety of conditions are given with current indexing techniques, showing improved performance in cost, i.e., the total number of pages accessed for retrieval and update.

  • PDF

음소인식 오류에 강인한 N-gram 기반 음성 문서 검색 (N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors)

  • 이수장;박경미;오영환
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF

비디오 데이터의 색인과 검색 (Indexing and Retrieving of Video Data)

  • 허진용;박동원;안성옥
    • 공학논문집
    • /
    • 제3권1호
    • /
    • pp.107-116
    • /
    • 1998
  • 본 논문의 목적은 초고속 정보 통신망의 Client/Server 환경에서 MPEG 동영상을 멀티미디어 데이터베이스 관리 시스템에 저장하여 실시간 검색하고 이를 복원할 수 있는 환경을 개발하는 데 있다. 본 논문에서는 멀티미디어 데이터 전송에 필수적인 ATM 통신 환경 구축 및 MPEG-2 동영상과 ATM 망과의 연계동작 시스템을 구축하고, MPEG-2 TS 데이터를 분석하여 동영상 검색에 필수적인 I-프레임 및 키 프레임 추출 방법을 개발한다. 또한 추출한 키 프레임에 대한 색인을 부여할 수 있는 색인 편집기 및 색인을 통해 검색을 수행하는 동영상 검색기를 개발하며, 키 프레임 및 이와 관련된 색인을 관리할 수 있는 멀티미디어 데이터베이스 스키마를 설계 및 구축한다.

  • PDF

A Column-Aware Index Management Using Flash Memory for Read-Intensive Databases

  • Byun, Si-Woo;Jang, Seok-Woo
    • Journal of Information Processing Systems
    • /
    • 제11권3호
    • /
    • pp.389-405
    • /
    • 2015
  • Most traditional database systems exploit a record-oriented model where the attributes of a record are placed contiguously in a hard disk to achieve high performance writes. However, for read-mostly data warehouse systems, the column-oriented database has become a proper model because of its superior read performance. Today, flash memory is largely recognized as the preferred storage media for high-speed database systems. In this paper, we introduce a column-oriented database model based on flash memory and then propose a new column-aware flash indexing scheme for the high-speed column-oriented data warehouse systems. Our index management scheme, which uses an enhanced $B^+$-Tree, achieves superior search performance by indexing an embedded segment and packing an unused space in internal and leaf nodes. Based on the performance results of two test databases, we concluded that the column-aware flash index management outperforms the traditional scheme in the respect of the mixed operation throughput and its response time.