DOI QR코드

DOI QR Code

An Efficient Information Retrieval System for Unstructured Data Using Inverted Index

  • 투고 : 2024.07.05
  • 발행 : 2024.07.30

초록

The inverted index is combination of the keywords and posting lists associated for indexing of document. In modern age excessive use of technology has increased data volume at a very high rate. Big data is great concern of researchers. An efficient Document indexing in big data has become a major challenge for researchers. All organizations and web engines have limited number of resources such as space and storage which is very crucial in term of data management of information retrieval system. Information retrieval system need to very efficient. Inverted indexing technique is introduced in this research to minimize the delay in retrieval of data in information retrieval system. Inverted index is illustrated and then its issues are discussed and resolve by implementing the scalable inverted index. Then existing algorithm of inverted compared with the naïve inverted index. The Interval list of inverted indexes stores on primary storage except of auxiliary memory. In this research an efficient architecture of information retrieval system is proposed particularly for unstructured data which don't have a predefined structure format and data volume.

키워드

과제정보

Authors are grateful to Dr. Muhammad Irfan Khan, Department of Computer Science, GC University Faisalabad for his valuable suggestions in the analysis of data.

참고문헌

  1. Bai, Qiuying & Ma, Chi and Chen, Xuechang. A new index model based on inverted index. ICSESS 2012 - Proceedings of 2012 IEEE 3rd International Conference on Software Engineering and Service Science. 157-160. 2012.
  2. Chandwani, G., Ahlawat, A., & Dubey, G. An approach for document retrieval using cluster-based inverted indexing. Journal of Information Science. 2021
  3. Junxiu, A. The Research of Non-back Multi-words Matching Algorithm Based on Aggregate Address Inverted Index. In 2009 International Conference on E-Learning, E-Business, Enterprise Information Systems, and E-Government (pp. 200-203). IEEE.2009.
  4. Al-Dallal, A., and Shaker, R. Genetic algorithm in web search using inverted index representation. In 2009 5th IEEE GCC Conference & Exhibition (pp. 1-5). IEEE. 2009.
  5. Siregar, A. M., and Puspabhuana, A. Improvement of term weight result in the information retrieval systems. In 2017 4th International Conference on New Media Studies (CONMEDIA) (pp. 108-112). IEEE.2017.
  6. Orhean, A. I., Ijagbone, I., Raicu, I., Chard, K., and Zhao, D. Toward scalable indexing and search on distributed and unstructured data. In 2017 IEEE International Congress on Big Data (BigData Congress) (pp. 31-38). IEEE. 2017.
  7. Cambazoglu, B. B., Kayaaslan, E., Jonassen, S., and Aykanat, C. A term-based inverted index partitioning model for efficient distributed query processing. ACM Transactions on the Web (TWEB), 7(3), 1-23.2013 https://doi.org/10.1145/2516633.2516637
  8. Jo, T. Clustering news groups using inverted index based NTSO. In 2009 First International Conference on Networked Digital Technologies (pp. 1-7). IEEE. 2009.
  9. Teshome, A. K., Kibret, B., and Lai, D. T. A review of implant communication technology in WBAN: Progress and challenges. IEEE reviews in biomedical engineering, 12, 88-99.2018. https://doi.org/10.1109/RBME.2018.2848228
  10. Cheng, X., and Singer, A. The spectrum of random inner-product kernel matrices. Random Matrices: Theory and Applications, 2(04).2013.
  11. Anh, V.N., Moffat, A. Inverted Index Compression Using Word-Aligned Binary Codes. Information Retrieval 8, 151-166. 2005. https://doi.org/10.1023/B:INRT.0000048490.99518.5c
  12. Zobel, J., and Moffat, A. Inverted files for text search engines. ACM computing surveys (CSUR), 38(2), 2006.
  13. Jung, W., Roh, H., Shin, M., and Park, S. Inverted index maintenance strategy for flashSSDs: Revitalization of in-place index update strategy. Information Systems, 49, 25-39.2015. https://doi.org/10.1016/j.is.2014.11.004
  14. Dabbechi, H., Haddar, N., Abdallah, M. B., and Haddar, K. A unified multidimensional data model from social networks for unstructured data analysis. In 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA) (pp. 415-422). IEEE.2017.
  15. Jayaraman, P. P., Mitra, K., Saguna, S., Shah, T., Georgakopoulos, D., and Ranjan, R. Orchestrating quality of service in the cloud of things ecosystem. In 2015 IEEE International Symposium on Nanoelectronic and Information Systems. pp. 185-190. 2015.
  16. Sun, S., Gong, J., Zomaya, A. Y., and Wu, A. A distributed incremental information acquisition model for large-scale text data. Cluster computing, 22(1), 2383-2394.2019. https://doi.org/10.1007/s10586-017-1498-8
  17. Nepomnyachiy, S., and Suel, T. Efficient index updates for mixed update and query loads. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 984-991). IEEE.2016.
  18. Giangreco, I., Al Kabary, I., and Schuldt, H. Adam-a database and information retrieval system for big multimedia collections. In 2014 IEEE International Congress on Big Data (pp. 406-413). IEEE.2014.
  19. Lomotey, R. K., and Deters, R. Towards knowledge discovery in big data. In 2014 IEEE 8th International Symposium on Service Oriented System Engineering (pp. 181-191). IEEE. 2014.
  20. Arab, A., and Abrishami, S. MDMP: a new algorithm to create inverted index files in BigData, using MapReduce. In 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE) (pp. 372-378). IEEE. 2017.
  21. Ma, C., Xia, W., Chen, F., Liu, J., Dai, Q., Jiang, L., and Liu, W. A content-based remote sensing image change information retrieval model. ISPRS International Journal of Geo-Information, 6(10), 310. 2017.
  22. Lomotey, R. K., and Deters, R. Architectural designs from mobile cloud computing to ubiquitous cloud computing-survey. In 2014 IEEE World Congress on Services (pp. 418-425). IEEE.2014.
  23. Lomotey, R. K., and Deters, R. Analytics-as-a-service framework for terms association mining in unstructured data. International Journal of Business Process Integration and Management, 7(1), 49-61.2014. https://doi.org/10.1504/IJBPIM.2014.060604
  24. Konow, R., Navarro, G., Clarke, C. L., and Lopez-Ortiz, A. Faster and smaller inverted indices with treaps. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval (pp. 193-202).2013.
  25. Jiang, D., Leung, K. W. T., Yang, L., & Ng, W. TEII: Topic enhanced inverted index for top-k document retrieval. Knowledge-Based Systems, 89, 346-358.2015. https://doi.org/10.1016/j.knosys.2015.07.014
  26. Sung W, Ahamd J., Muhammad, K., Bakshi, S., and Baik. Object-oriented convolutional features for fine-grained image retrieval in large surveillance datasets. Future Generation Computer Systems, 81, 314-330. 2018. https://doi.org/10.1016/j.future.2017.11.002
  27. Giridharan, J., and Vairavan, S. V. Inverted index and interval lists for keyword search. In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (pp. 1-4). IEEE.
  28. Wu, H., Li, G., and Zhou, L. Ginix: Generalized inverted index for keyword search. Tsinghua Science and Technology, 18(1), 77-87.2013 https://doi.org/10.1109/TST.2013.6449411
  29. B. Wang, W. Song, W. Lou and Y. T. Hou, Inverted index based multi-keyword public-key searchable encryption with strong privacy guarantee, IEEE Conference on Computer Communications (INFOCOM), 2015, pp. 2092-2100.
  30. A. Babenko and V. Lempitsky, The Inverted Multi-Index, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 6, pp. 1247-1260, 2015. https://doi.org/10.1109/TPAMI.2014.2361319
  31. Y. Qiao, X. Yun and Y. Zhang, Fast Reused Function Retrieval Method Based on Simhash and Inverted Index, 2016 IEEE Trustcom/BigDataSE/ISPA, 2016, pp. 937-944, 2016.
  32. Lin, Z., Ding, G., Han, J., and Wang, J. Cross-view retrieval via probability-based semantics-preserving hashing. IEEE transactions on cybernetics, 47(12), 4342-4355. 2016. https://doi.org/10.1109/TCYB.2016.2608906
  33. Dell, L. A., Patzke, N., Spocter, M. A., Siegel, J. M., and Manger, P. R. Organization of the sleep-related neural systems in the brain of the harbour porpoise (Phocoena phocoena). Journal of Comparative Neurology, 524(10), 1999-2017. 2016 https://doi.org/10.1002/cne.23929
  34. Orhean, A. I., Pop, F., and Raicu, I. New scheduling approach using reinforcement learning for heterogeneous distributed systems. Journal of Parallel and Distributed Computing, 117, 292-302. 2018 https://doi.org/10.1016/j.jpdc.2017.05.001
  35. S. Anggai, I. S. Blekanov and S. L. Sergeev, Construction inverted index for dynamic collections visualization in thematic virtual museums system, 2017 3rd International Conference on Science and Technology - Computer (ICST), 2017, pp. 186-189.
  36. Gamberini, L., Barresi, G., Maier, A., and Scarpetta, F. A game a day keeps the doctor away: A short review of computer games in mental healthcare. Journal of CyberTherapy and Rehabilitation, 1(2), 127-145. 2008.
  37. Zhu, N., Lu, Y., He, W., & Hua, Y. A content-based indexing scheme for large-scale unstructured data. In 2017 IEEE Third International Conference on Multimedia Big Data (BigMM) (pp. 205-212). IEEE. 2017.
  38. Singh, R., and Mohaar, G. S. Fast Document Indexing Using Aho-Corasick State Machine. In 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI) (pp. 469-475). IEEE.2016.
  39. Karthika, N., and Janet, B. Word pair index structure for information retrieval using Terrier3. 5. In 2017 International Conference on Computational Intelligence in Data Science (ICCIDS) (pp. 1-6). IEEE. 2017.
  40. Jing, W., Tong, D., Chen, G., Zhao, C., and Zhu, L. An optimized method of HDFS for massive small files storage. Computer Science and Information Systems, 15(3), 533-548. 2018 https://doi.org/10.2298/CSIS171015021J
  41. Do, Y., Kim, S. H., Na, I. S., Kang, D. W., and Kim, J. H. Image retrieval using wavelet transform and shape decomposition. In Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication (pp. 1-8).2013.
  42. Wei, Z., and Jinzhe, J. An improved association rule algorithm based on trie and inverted index. In Proceedings 2011 International Conference on Transportation, Mechanical, and Electrical Engineering (TMEE) pp 1669-1672. IEEE.2011.
  43. Kori, S., Zhu, Y., Yamaguchi, K., Takiguchi, S., and Takama, Y. Analysis of user's behaviour based on search intentions for information retrieval using search engines. In 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI), (pp. 64-70). IEEE. 2015
  44. Abdullah, A., Yong, K. K., Karuppiah, E. K., and Chong, P. K. Multi keyword range search in gpu and mic: A comparison study. In 2014 IEEE Conference on Open Systems (ICOS). (pp. 117-122). IEEE.2014
  45. Liu, M., Po, L. M., Rehman, Y. A. U., Xu, X., Li, Y., and Feng, L. A novel inverted index file based searching strategy for video copy detection. In 2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) (pp. 307-312). IEEE. 2017
  46. Li, G., Ooi, B. C., Feng, J., Wang, J., and Zhou, L. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 903-914).2008
  47. Li, X., Li, K., Qiao, D., Ding, Y., and Wei, D. Application research of machine learning method based on distributed cluster in information retrieval. In 2019 International Conference on Communications, Information System and Computer Engineering (CISCE) (pp. 411-414). IEEE.2019.
  48. Zamani, H., Dehghani, M., Croft, W. B., Learned-Miller, E., and Kamps, J. From neural re-ranking to neural ranking: Learning a sparse representation for inverted indexing. In Proceedings of the 27th ACM international conference on information and knowledge management (pp. 497-506). 2018.
  49. Park, B. K., and Song, I. Y.Toward total business intelligence incorporating structured and unstructured data. In Proceedings of the 2nd International Workshop on Business intelligence and the WEB (pp. 12-19).2011.
  50. Hariharan, R., Hore, B., Li, C., and Mehrotra, S. Processing spatial-keyword (SK) queries in geographic information retrieval (GIR) systems. In 19th International Conference on Scientific and Statistical Database Management (SSDBM 2007) pp. 16-16). IEEE.2007