Vision-Based Activity Recognition Monitoring Based on Human-Object Interaction at Construction Sites

  • Chae, Yeon (Department of Architecture and Architectural Engineering, Seoul National University) ;
  • Lee, Hoonyong (Department of Construction Science, College of Architecture, Texas A&M University) ;
  • Ahn, Changbum R. (Department of Architecture and Architectural Engineering, Seoul National University) ;
  • Jung, Minhyuk (Department of Architecture and Architectural Engineering, Seoul National University) ;
  • Park, Moonseo (Department of Architecture and Architectural Engineering, Seoul National University)
  • Published : 2022.06.20

Abstract

Vision-based activity recognition has been widely attempted at construction sites to estimate productivity and enhance workers' health and safety. Previous studies have focused on extracting an individual worker's postural information from sequential image frames for activity recognition. However, various trades of workers perform different tasks with similar postural patterns, which degrades the performance of activity recognition based on postural information. To this end, this research exploited a concept of human-object interaction, the interaction between a worker and their surrounding objects, considering the fact that trade workers interact with a specific object (e.g., working tools or construction materials) relevant to their trades. This research developed an approach to understand the context from sequential image frames based on four features: posture, object, spatial features, and temporal feature. Both posture and object features were used to analyze the interaction between the worker and the target object, and the other two features were used to detect movements from the entire region of image frames in both temporal and spatial domains. The developed approach used convolutional neural networks (CNN) for feature extractors and activity classifiers and long short-term memory (LSTM) was also used as an activity classifier. The developed approach provided an average accuracy of 85.96% for classifying 12 target construction tasks performed by two trades of workers, which was higher than two benchmark models. This experimental result indicated that integrating a concept of the human-object interaction offers great benefits in activity recognition when various trade workers coexist in a scene.

Keywords

Acknowledgement

This work was supported by the BK21 FOUR (Fostering Outstanding Universities for Research) Project in 2022 (No.4120200113771).