Search | Korea Science

Implementation of the Inverted File for Indexing Large-volume Data (대용량 데이터 색인에 적합한 역파일의 구현)

Sung Chae Lim
- Proceedings of the Korea Information Processing Society Conference
- /
- 2008.11a
- /
- pp.909-912
- /
- 2008
대용량 문서에 대한 키워드 검색을 위해 역파일(inverted-file) 색인 기법이 널리 쓰이고 있다. 역파일 색인 기법을 구현함에 있어 고려되어야 할 점은 키워드 검색 처리 시에 디스크 사용을 최소로 할 수 있는 방법이다. 크기가 작은 역파일이라면 디스크 I/O 사용도 작고 필요시 역파일을 메모리에 적재하여 둠으로써 디스크 사용을 크게 줄일 수 있다. 하지만, 웹 검색이나 규모가 큰 도서관 시스템에서와 같이 색인 데이터 크기가 매우 큰 경우 역파일을 읽는 디스크 비용이 급격히 증가할 수 있다. 본 논문에서는 매우 큰 크기의 역파일을 사용하는 검색 환경에서 디스크 사용을 최소로 할 수 있는 역파일 구조를 제안한다. 제안된 구조는 질의 처리 과정을 고려해 계층 구조로 설계되며 실제 상용 시스템에 적용되어 안정성 및 성능을 입증했다.
https://doi.org/10.3745/PKIPS.y2008m011a.909 인용 PDF

n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure (n-gram/2L: 공간 및 시간 효율적인 2단계 n-gram 역색인 구조)

Kim Min-Soo;Whang Kyu-Young;Lee Jae-Gil;Lee Min-Jae
- Journal of KIISE:Databases
- /
- v.33 no.1
- /
- pp.12-31
- /
- 2006
The n-gram inverted index has two major advantages: language-neutral and error-tolerant. Due to these advantages, it has been widely used in information retrieval or in similar sequence matching for DNA and Protein databases. Nevertheless, the n-gram inverted index also has drawbacks: the size tends to be very large, and the performance of queries tends to be bad. In this paper, we propose the two-level n-gram inverted index (simply, the n-gram/2L index) that significantly reduces the size and improves the query performance while preserving the advantages of the n-gram inverted index. The proposed index eliminates the redundancy of the position information that exists in the n-gram inverted index. The proposed index is constructed in two steps: 1) extracting subsequences of length m from documents and 2) extracting n-grams from those subsequences. We formally prove that this two-step construction is identical to the relational normalization process that removes the redundancy caused by a non-trivial multivalued dependency. The n-gram/2L index has excellent properties: 1) it significantly reduces the size and improves the Performance compared with the n-gram inverted index with these improvements becoming more marked as the database size gets larger; 2) the query processing time increases only very slightly as the query length gets longer. Experimental results using databases of 1 GBytes show that the size of the n-gram/2L index is reduced by up to 1.9${\~}$2.7 times and, at the same time, the query performance is improved by up to 13.1 times compared with those of the n-gram inverted index.
PDF KSCI

Kullback-Leibler Information-Based Tests of Fit for Inverse Gaussian Distribution (역가우스분포에 대한 쿨백-라이블러 정보 기반 적합도 검정)

Choi, Byung-Jin
- The Korean Journal of Applied Statistics
- /
- v.24 no.6
- /
- pp.1271-1284
- /
- 2011
The entropy-based test of fit for the inverse Gaussian distribution presented by Mudholkar and Tian(2002) can only be applied to the composite hypothesis that a sample is drawn from an inverse Gaussian distribution with both the location and scale parameters unknown. In application, however, a researcher may want a test of fit either for an inverse Gaussian distribution with one parameter known or for an inverse Gaussian distribution with both the two partameters known. In this paper, we introduce tests of fit for the inverse Gaussian distribution based on the Kullback-Leibler information as an extension of the entropy-based test. A window size should be chosen to implement the proposed tests. By means of Monte Carlo simulations, window sizes are determined for a wide range of sample sizes and the corresponding critical values of the test statistics are estimated. The results of power analysis for various alternatives report that the Kullback-Leibler information-based goodness-of-fit tests have good power.
https://doi.org/10.5351/KJAS.2011.24.6.1271 인용 PDF KSCI

An Update-Efficient, Disk-Based Inverted Index Structure for Keyword Search on Data Streams (데이터 스트림에 대한 키워드 검색을 위한, 효율적인 갱신이 가능한 디스크 기반 역색인 구조)

Park, Eun Ju;Lee, Ki Yong
- KIPS Transactions on Software and Data Engineering
- /
- v.5 no.4
- /
- pp.171-180
- /
- 2016
As social networking services such as twitter become increasingly popular, data streams are widely prevalent these days. In order to search data accumulated from data streams efficiently, the use of an index structure is essential. In this paper, we propose an update-efficient, disk-based inverted index structure for efficient keyword search on data streams. When new data arrive at the data stream, the index needs to be updated to incorporate the new data. The traditional inverted index is very inefficient to update in terms of disk I/O, because all index data stored in the disk need to be read and written to the disk each time the index is updated. To solve this problem, we divide the whole inverted index into a sequence of inverted indices with exponentially increasing size. When new data arrives, it is first inserted into the smallest index and, later, the small indices are merged with the larger indices, which leads to a small amortize update cost for each new data. Furthermore, when indices stored in the disk are merged with each other, we minimize the disk I/O cost incurred for the merge operation, resulting in an even smaller update cost. Through various experiments, we compare the update efficiency of the proposed index structure with the previous one, and show the performance advantage of the proposed structure in terms of the update cost.
https://doi.org/10.3745/KTSDE.2016.5.4.171 인용 PDF KSCI

An Optimal Determination of Subband-Frame Size and Mode Switching Level for Adaptive OFDM-TDD System (시분할 듀플렉싱 기반의 적응 직교 주파수 분할 다중 접속 시스템에서 부대역-프레임 크기와 모드 변환점의 최적 결정 기법)

Shin Kil-Ho;Lee Chang-Suk;Kim Jung-Gon;Kim Hyung-Myung
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.30 no.6C
- /
- pp.512-522
- /
- 2005
In this paper, an optimal determination method of the subband-frame size and mode-switching level is proposed for adaptive OFDM-TDD systems in frequency-selective time-varying channels. The optimization problem considering frequency selectivity. user's mobility, and the signaling overhead caused by the mode change information is formulated in the maximum spectral efficiency sense satisfying the target BER. Assuming that subband-frame size is given, the mode-switching level is first optimized so that the spectral efficiency can be maximized satisfying the target BER. The subband-frame size among candidates is then determined, which maximizes the spectral efficiency. Simulation results show that the proposed scheme outperforms conventional schemes, in terms of the spectral efficiency and the BER.
PDF KSCI

A Modi ed Entropy-Based Goodness-of-Fit Tes for Inverse Gaussian Distribution (역가우스분포에 대한 변형된 엔트로피 기반 적합도 검정)

Choi, Byung-Jin
- The Korean Journal of Applied Statistics
- /
- v.24 no.2
- /
- pp.383-391
- /
- 2011
This paper presents a modified entropy-based test of fit for the inverse Gaussian distribution. The test is based on the entropy difference of the unknown data-generating distribution and the inverse Gaussian distribution. The entropy difference estimator used as the test statistic is obtained by employing Vasicek's sample entropy as an entropy estimator for the data-generating distribution and the uniformly minimum variance unbiased estimator as an entropy estimator for the inverse Gaussian distribution. The critical values of the test statistic empirically determined are provided in a tabular form. Monte Carlo simulations are performed to compare the proposed test with the previous entropy-based test in terms of power.
https://doi.org/10.5351/KJAS.2011.24.2.383 인용 PDF KSCI

Design of Inverse Square Root Unit Using 2-Stage Pipeline Architecture (2-Stage Pipeline 구조를 이용한 역제곱근 연산기의 설계)

Kim, Jung-Hoon;Kim, Ki-Chul
- Proceedings of the Korean Information Science Society Conference
- /
- 2007.10b
- /
- pp.198-201
- /
- 2007
본 논문에서는 변형된 Newton-Raphson 알고리즘과 LUT(Look Up Table)를 사용하는 역제곱근 연산기를 제안한다. Newton-Raphson 부동소수점 역수 알고리즘은 일정한 횟수의 곱셈을 반복하여 역수 제곱근을 계산하는 방식이다. 변형된 Newton-Raphson 알고리즘은 하드웨어 구현에 적합하도록 변환되었으며, LUT는 오차를 줄이기 위해 개선되었다. 제안된 연산기는 LUT의 크기를 최소화하고, 순환적인 구조가 아닌 2-stage pipeline 구조를 가진다. 또한 IEEE-754 부동소수점 표준을 기초로 하는 24-bit 데이터 형식을 사용해 면적과 속도 향상에 유리하여 휴대용 기기의 멀티미디어 분야의 응용에 적합하다. 본 역제곱근 연산기는 소수점 이하 8-bit의 정확도를 가지며 VHDL을 이용하여 설계되었다. 그 크기는 $0.18{\mu}m$ CMOS 공정에서 약 4,000 gate의 크기를 보였으며 150MHz에서 동작이 가능하다.
PDF

The Design of Broadband Patch Antenna with Microstrip Line-Probe Feeder (마이크로스트립 라인-프로브 급전구조를 갖는 광대역 패치 안테나의 설계)

박종열;이윤경;윤현보
- The Journal of Korean Institute of Electromagnetic Engineering and Science
- /
- v.13 no.7
- /
- pp.687-692
- /
- 2002
The design of broadband patch antenna with the new feeder structure is proposed in this paper. The antenna operates in the center frequency 5.8 GHz. Proposed the new feeder structure has also broadband characteristic and reduced antenna size. To confirm broadband characteristic, compared with probe feeder antenna. After designing and manufacturing, the antenna bandwidth enhanced by 34.5 % and the patch size reduced by 45 %.
PDF KSCI

Wave Distribution with Reflection In Dongbaek Island Area (반사율을 고려한 동백섬 해역의 파랑 분포)

유동훈;신수훈
- Proceedings of the Korean Society of Coastal and Ocean Engineers Conference
- /
- 2003.08a
- /
- pp.254-258
- /
- 2003
파도가 심해역에서 천해역으로 들어오게 되면 천수, 굴절, 회절 및 마찰손실 등에 의하여 급격하게 변이한다. 따라서 수치모형의 격자는 해저지형 및 파의 변이 정도에 따라 상당히 작은 크기를 사용하여야 하는데 대개 100 m 내외가 되며 때로는 10 m까지 상당히 작은 크기의 격자를 사용하여야 하는 경우도 있다. (중략)
PDF

A Space-Efficient Inverted Index Technique using Data Rearrangement for String Similarity Searches (유사도 검색을 위한 데이터 재배열을 이용한 공간 효율적인 역 색인 기법)

Im, Manu;Kim, Jongik
- Journal of KIISE
- /
- v.42 no.10
- /
- pp.1247-1253
- /
- 2015
An inverted index structure is widely used for efficient string similarity search. One of the main requirements of similarity search is a fast response time; to this end, most techniques use an in-memory index structure. Since the size of an inverted index structure usually very large, however, it is not practical to assume that an index structure will fit into the main memory. To alleviate this problem, we propose a novel technique that reduces the size of an inverted index. In order to reduce the size of an index, the proposed technique rearranges data strings so that the data strings containing the same q-grams can be placed close to one other. Then, the technique encodes those multiple strings into a range. Through an experimental study using real data sets, we show that our technique significantly reduces the size of an inverted index without sacrificing query processing time.
https://doi.org/10.5626/JOK.2015.42.10.1247 인용 KSCI

Search Result 1,103, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)