Search | Korea Science

An Efficient Information Retrieval System for Unstructured Data Using Inverted Index

Abdullah Iftikhar;Muhammad Irfan Khan;Kulsoom Iftikhar
- International Journal of Computer Science & Network Security
- /
- v.24 no.7
- /
- pp.31-44
- /
- 2024
The inverted index is combination of the keywords and posting lists associated for indexing of document. In modern age excessive use of technology has increased data volume at a very high rate. Big data is great concern of researchers. An efficient Document indexing in big data has become a major challenge for researchers. All organizations and web engines have limited number of resources such as space and storage which is very crucial in term of data management of information retrieval system. Information retrieval system need to very efficient. Inverted indexing technique is introduced in this research to minimize the delay in retrieval of data in information retrieval system. Inverted index is illustrated and then its issues are discussed and resolve by implementing the scalable inverted index. Then existing algorithm of inverted compared with the naïve inverted index. The Interval list of inverted indexes stores on primary storage except of auxiliary memory. In this research an efficient architecture of information retrieval system is proposed particularly for unstructured data which don't have a predefined structure format and data volume.
https://doi.org/10.22937/IJCSNS.2024.24.7.4 인용 PDF

A Novel Reversible Data Hiding Scheme for VQ-Compressed Images Using Index Set Construction Strategy

Qin, Chuan;Chang, Chin-Chen;Chen, Yen-Chang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.7 no.8
- /
- pp.2027-2041
- /
- 2013
In this paper, we propose a novel reversible data hiding scheme in the index tables of the vector quantization (VQ) compressed images based on index set construction strategy. On the sender side, three index sets are constructed, in which the first set and the second set include the indices with greater and less occurrence numbers in the given VQ index table, respectively. The index values in the index table belonging to the second set are added with prefixes from the third set to eliminate the collision with the two derived mapping sets of the first set, and this operation of adding prefixes has data hiding capability additionally. The main data embedding procedure can be achieved easily by mapping the index values in the first set to the corresponding values in the two derived mapping sets. The same three index sets reconstructed on the receiver side ensure the correctness of secret data extraction and the lossless recovery of index table. Experimental results demonstrate the effectiveness of the proposed scheme.
https://doi.org/10.3837/tiis.2013.08.016 인용 PDF KSCI

A Study on the Reliability Evaluation Index Development for the Information Resources Retained by Institutions: Focusing on Humanities Assets

Jeong, Dae-Keun;Noh, Younghee
- International Journal of Knowledge Content Development & Technology
- /
- v.9 no.2
- /
- pp.65-89
- /
- 2019
This study has the aim of developing an evaluation index that can help evaluate the reliability of the information resources of institutions retaining humanities assets for the purposes of laying out the foundation for providing one-stop portal service for humanities assets. To this end, the evaluation index was derived through the analysis of previous research, case studies, and interviews with experts, the derived evaluation index was then applied to the humanities assets retaining institutions to verify the utility. The institutional information resources' reliability evaluation index consisted of the two dimensions of the institutions' own reliability evaluation index. The institution provided a service and system evaluation index. The institutions' own reliability evaluation index consisted of 25 points for institutional authority, 25 points for data collection and construction, 30 points for data provision, and 20 points for appropriateness of data, for a total of 100 points, respectively. The institution provided service and system evaluation indexes consisting of 25 points for information quality, 15 points for appropriateness (decency), 15 points for accessibility, 20 points for tangibility, 15 points for form, and 10 points for cooperation, for the total of 100 points, respectively. The derived evaluation index was used to evaluate the utility of 6 institutions representing humanities assets through application. Consequently, the reliability of the information resources retained by the Research Information Service System (RISS) of the Korea Education & Research Information Service (KERIS) turned out to be the highest.
https://doi.org/10.5865/IJKCT.2019.9.2.065 인용 PDF KSCI HTML

Geohashed Spatial Index Method for a Location-Aware WBAN Data Monitoring System Based on NoSQL

Li, Yan;Kim, Dongho;Shin, Byeong-Seok
- Journal of Information Processing Systems
- /
- v.12 no.2
- /
- pp.263-274
- /
- 2016
The exceptional development of electronic device technology, the miniaturization of mobile devices, and the development of telecommunication technology has made it possible to monitor human biometric data anywhere and anytime by using different types of wearable or embedded sensors. In daily life, mobile devices can collect wireless body area network (WBAN) data, and the co-collected location data is also important for disease analysis. In order to efficiently analyze WBAN data, including location information and support medical analysis services, we propose a geohash-based spatial index method for a location-aware WBAN data monitoring system on the NoSQL database system, which uses an R-tree-based global tree to organize the real-time location data of a patient and a B-tree-based local tree to manage historical data. This type of spatial index method is a support cloud-based location-aware WBAN data monitoring system. In order to evaluate the proposed method, we built a system that can support a JavaScript Object Notation (JSON) and Binary JSON (BSON) document data on mobile gateway devices. The proposed spatial index method can efficiently process location-based queries for medical signal monitoring. In order to evaluate our index method, we simulated a small system on MongoDB with our proposed index method, which is a document-based NoSQL database system, and evaluated its performance.
https://doi.org/10.3745/JIPS.04.0025 인용 PDF KSCI

An Investigation of the Cooperative Relationships in the ILL Services of Academic Libraries by Applying the Collaboration Index - Focusing on the S University Library in Korea - (협업지수를 응용한 대학도서관 상호대차 협력 관계 분석 - S대학교 도서관을 중심으로 -)

Yook, Jihye;Lee, Go Eun;Park, Ji-Hong
- Journal of Korean Library and Information Science Society
- /
- v.46 no.4
- /
- pp.493-510
- /
- 2015
The purpose of this study is to analyze cooperative relationship and information needs using the interlibrary loan (ILL) service data of the academic libraries. This study interprets the ILL service data as information source that including unsolved information problems. Also, this study normalizes the ILL service data using the collaboration index. The results of this study have three aspects. First, col hs-index can be useful tool for analyzing ILL service relationship between different sizes of libraries. Second, this study find out the information needs and the collection characteristics of each library after analyzing ILL data by subjects. Third, by applying col hs-index, we could analysis more objective ILL data and found out possibility of bibliographic index.
https://doi.org/10.16981/kliss.46.201512.493 인용 PDF KSCI

A Method of Data Hiding in a File System by Modifying Directory Information

Cho, Gyu-Sang
- Journal of the Korea Society of Computer and Information
- /
- v.23 no.8
- /
- pp.85-93
- /
- 2018
In this research, it is proposed that a method to hide data by modifying directory index entry information. It consists of two methods: a directory list hiding and a file contents hiding. The directory list hiding method is to avoid the list of files from appearing in the file explorer window or the command prompt window. By modifying the file names of several index entries to make them duplicated, if the duplicated files are deleted, then the only the original file is deleted, but the modified files are retained in the MFT entry intact. So, the fact that these files are hidden is not exposed. The file contents hiding is to allocate data to be hidden on an empty index record page that is not used. If many files are made in the directory, several 4KB index records are allocated. NTFS leaves the empty index records unchanged after deleting the files. By modifying the run-list of the index record with the cluster number of the file-to-hide, the contents of the file-to-hide are hidden in the index record. By applying the proposed method to the case of hiding two files, the file lists are not exposed in the file explorer and the command prompt window, and the contents of the file-to-hide are hidden in the empty index record. It is proved that the proposed method has effectiveness and validity.
https://doi.org/10.9708/jksci.2018.23.08.085 인용 PDF KSCI

A Study on the Safety Index Service Model by Disaster Sector using Big Data Analysis (빅데이터 분석을 활용한 재해 분야별 안전지수 서비스 모델 연구)

Jeong, Myoung Gyun;Lee, Seok Hyung;Kim, Chang Soo
- Journal of the Society of Disaster Information
- /
- v.16 no.4
- /
- pp.682-690
- /
- 2020
Purpose: This study builds a database by collecting and refining disaster occurrence data and real-time weather and atmospheric data. In conjunction with the public data provided by the API, we propose a service model for the Big Data-based Urban Safety Index. Method: The plan is to provide a way to collect various information related to disaster occurrence by utilizing public data and SNS, and to identify and cope with disaster situations in areas of interest by real-time dashboards. Result: Compared with the prediction model by extracting the characteristics of the local safety index and weather and air relationship by area, the regional safety index in the area of traffic accidents confirmed that there is a significant correlation with weather and atmospheric data. Conclusion: It proposed a system that generates a prediction model for safety index based on machine learning algorithm and displays safety index by sector on a map in areas of interest to users.
https://doi.org/10.15683/kosdi.2020.12.31.682 인용 PDF KSCI

An Index Splitting Technique for Numerous Sensor Data Archiving (대용량 센서 데이터 아카이빙을 위한 색인 분할 기법)

Cho, Dae-Soo
- Journal of Korea Spatial Information System Society
- /
- v.9 no.1
- /
- pp.31-43
- /
- 2007
Sensor data have the characteristics such as numerous and continuous data. Therefore, it is required to develop an index which could retrieve a specific sensor data efficiently from numerous sensed data. The index should have an efficient delete operation for the past data to support the data archiving. In this paper, we have proposed and implemented an index splitting technique to support the sensor data archiving. These splitted indexes compose of a virtual index (that is, index management component), which is shown as single tree from outside. Experimental results show that in the case of 100,000 insert operations the splitted index performs 8% better than the traditional TB-tree maximumly. And the splitted index outperforms TB-tree with retrieving queries when the region of query is small and the size of time domain is large.
PDF

Efficient Index Reconstruction Methods using a Partial Index in a Spatial Data Warehouse (공간 데이터 웨어하우스에서 부분 색인을 이용한 효율적인 색인 재구축 기법)

Kwak, Dong-Uk;Jeong, Young-Cheol;You, Byeong-Seob;Kim, Jae-Hong;Bae, Hae-Young
- Journal of Korea Spatial Information System Society
- /
- v.7 no.3 s.15
- /
- pp.119-130
- /
- 2005
A spatial data warehouse is a system that stores geographical information as a subject oriented, integrated, time-variant, non-volatile collection for efficiently supporting decision. This system consists of a builder and a spatial data warehouse server. A spatial data warehouse server suspends user services, stores transferred data in the data repository and constructs index using stored data for short response time. Existing methods that construct index are bulk-insertion and index transfer methods. The Bulk-insertion method has high clustering cost for constructing index and searching cost. The Index transfer method has improper for the index reconstruction method of a spatial data warehouse where periodic source data are inserted. In this paper, the efficient index reconstruction method using a partial index in a spatial data warehouse is proposed. This method is an efficient reconstruction method that transfers a partial index and stores a partial index with expecting physical location. This method clusters a spatial data making it suitable to construct index and change treated clusters to a partial index and transfers pages that store a partial index. A spatial data warehouse server reserves sequent physical space of a disk and stores a partial index in the reserved space. Through inserting a partial index into constructed index in a spatial data warehouse server, searching, splitting, remodifing costs are reduced to the minimum.
PDF

Better Bootstrap Confidence Intervals for Process Incapability Index $C_{pp}$

Cho, Joong-Jae;Han, Jeong-Hye;Lee, In-Pyo
- Journal of the Korean Data and Information Science Society
- /
- v.10 no.2
- /
- pp.341-357
- /
- 1999
Greenwich and Jahr-Schaffrath(1995) considered a new process incapability index(PII) $C_{pp}$, which modified the useful index $C^{\ast}_{pm}{$ for detecting assignable causes. The new index $C_{pp}$ provides an uncontaminated separation between information concerning the process accuracy and precision while this kind of information separation is not available with the $C^{\ast}_{pm}$ index. In this paper, we will study about the index $C_{pp}$ based on the bootstrap. First, we will prove the consistency of bootstrap deriving the bootstrap asymptotic distribution for our index $C_{pp}$. Moreover, with the consistency of bootstrap, we will construct six bootstrap confidence intervals and compare their performances. Some simulation results, comparison and analysis are provided. In particular, two STUD and ABC bootstrap methods perform significantly better.
PDF

Search Result 2,716, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)