Access efficiency of small sized files in Big Data using various Techniques on Hadoop Distributed File System platform

  • Alange, Neeta (Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation) ;
  • Mathur, Anjali (Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation)
  • Received : 2021.07.05
  • Published : 2021.07.30


In recent years Hadoop usage has been increasing day by day. The need of development of the technology and its specified outcomes are eagerly waiting across globe to adopt speedy access of data. Need of computers and its dependency is increasing day by day. Big data is exponentially growing as the entire world is working in online mode. Large amount of data has been produced which is very difficult to handle and process within a short time. In present situation industries are widely using the Hadoop framework to store, process and produce at the specified time with huge amount of data that has been put on the server. Processing of this huge amount of data having small files & its storage optimization is a big problem. HDFS, Sequence files, HAR, NHAR various techniques have been already proposed. In this paper we have discussed about various existing techniques which are developed for accessing and storing small files efficiently. Out of the various techniques we have specifically tried to implement the HDFS- HAR, NHAR techniques.



  1. Lian Xiong et al. "A Small File Merging Strategy for Spatiotemporal Data in Smart Health", IEEEAccess Special Section on Advanced Information Sensing and Learning Technologies for Data-Centric Smart Health Applications, Volume 7, 2019.
  2. Neeta Alange, Anjali Mathur, "Small Sized File Storage Problems in Hadoop Distributed File System", 2nd International conference on Smart Systems and Inventive Technology (ICSSIT 2019) vol. pp. 1198-1202, November 2019 proceedings published in IEEE Digital Xplore
  3. D Sethia et al "Optimized MapFile Based Storage of Small Files in Hadoop" 17th IEEE /ACM International Symposium on Cluster, Cloud and Grid Computing, 2017, pp 906-912.
  4. Bing et al "A Novel Approach for Efficient Accessing of Small files in HDFS:TLB-MapFile" 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).
  5. Alam et al. "Hadoop Architecture and its issues." International Conference on Computational Science and Computational Intelligence (CSCI), 2014 Vol. 2. IEEE, 2014.
  6. Ankita et al "A Novel Approach for Efficient Handling of Small Files in HDFS", IEEE International Advance Computing Conference (IACC, 2015), pp.1258-1262.
  7. Nivedita et. al "Optimization of Hadoop Small File Storage using Priority Model", 2nd IEEE International Conference On Recent Trends in Electronics Information & Communication Technology (RTEICT), pp. 1785-1789, May 2017.
  8. Parth et al "A Novel Approach to Improve the Performance of Hadoop in Handling of small files" 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), 5-7 March 2015
  9. Online Reference Apache Hadoop,
  11. Shubham et. al "An approach to solve a Small File problem in Hadoop by using Dynamic Merging and Indexing Scheme", International Journal on Recent and Innovation Trends in Computing and Communication [IJRITCC], November 2016, Volume: 4, Issue:11.
  12. Awais et al "Performance Efficiency in Hadoop for Storing and Accessing Small Files" 7th International Conference on Innovative Computing Technology (INTECH 2017), pp.211-216.
  13. Online Reference
  14. Online Reference
  15. Zhipeng et al "An Effective Merge Strategy Based Hierarchy For Improving Small File Problem on HDFS" IEEE Proceedings of CCIS2016, pp. 327-331.
  16. Chatuporn et al "Improving the Performance of Small-File Accessing in Hadoop" 11th International Joint Conference on Computer Science and Software Engineering (JCSSE, 2014), pp.200-205.
  17. Priyanka et al "An Innovative Strategy for Improved Processing of Small Files in Hadoop", International Journal of Application or Innovation in Engineering & Management (IJAIEM) , Volume 3, Issue 7, July 2014 , pp. 278-280 , ISSN 2319-4847
  18. Yonghua et al "SFS: A Massive small file processing middleware in Hadoop" IEICE, 18th Asia-Pacific Network Operations and Management Symposium (APNOMS) 2016.
  19. Kashmira et al , "Efficient Way for handling Small Files sing Extended HDFS" International Journal of Computer Science and Mobile Computing, Vol.3 Issue.6, June- 2014, pg. 785-789.
  20. Bo Dong et al "An Optimized Approach for Storing and Accessing Small Files on Cloud Storage". Journal of Network and Computer Applications 35 (2012) 1847-1862.
  21. Kun et al "MOSM: An Approach for Efficient String Massive Small Files on Hadoop" 2017 IEEE 2nd International Conference on Big Data Analysis(ICBDA), 2017
  22. Z. Gao et al "An effective merge strategy based hierarchy for improving small file problem on HDFS", in proceedings of 2016 4th IEEE International Conference on Cloud Computing and Intelligence Systems, CCIS 2016, 2016, pp. 327-331.
  23. Sachin et al "Dealing with small files problem in hadoop distributed file system", Procedia Computer Science Volume 79, 2016, Pages 1001-1012
  24. Tanvi et al "An extended HDFS with an AVATAR Node to handle both small files and to eliminate single point of failure" 2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI), 8-10 Oct. 2015
  25. Passent et al "HDFSX:Big Data Distributed File System with Small Files Support" 2016 12th International Computer Engineering Conference (ICENCO), 28-29 Dec. 2016, pp-131-135
  26. Online Reference