Egocentric Vision for Human Activity Recognition Using Deep Learning

Malika Douache;Badra Nawal Benmoussat;

doi:10.3745/JIPS.02.0207

Journal of Information Processing Systems

Volume 19 Issue 6
/
Pages.730-744
/
2023
/
1976-913X(pISSN)
/
2092-805X(eISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Egocentric Vision for Human Activity Recognition Using Deep Learning

Malika Douache (Automation, Vision and Intelligent Systems Control Laboratory, University of Sciences and Technology of Oran Mohamed-Boudiaf (USTOMB)) ;
Badra Nawal Benmoussat (Automation, Vision and Intelligent Systems Control Laboratory, University of Sciences and Technology of Oran Mohamed-Boudiaf (USTOMB))

Received : 2022.05.27
Accepted : 2022.09.09
Published : 2023.12.31

https://doi.org/10.3745/JIPS.02.0207 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

The topic of this paper is the recognition of human activities using egocentric vision, particularly captured by body-worn cameras, which could be helpful for video surveillance, automatic search and video indexing. This being the case, it could also be helpful in assistance to elderly and frail persons for revolutionizing and improving their lives. The process throws up the task of human activities recognition remaining problematic, because of the important variations, where it is realized through the use of an external device, similar to a robot, as a personal assistant. The inferred information is used both online to assist the person, and offline to support the personal assistant. With our proposed method being robust against the various factors of variability problem in action executions, the major purpose of this paper is to perform an efficient and simple recognition method from egocentric camera data only using convolutional neural network and deep learning. In terms of accuracy improvement, simulation results outperform the current state of the art by a significant margin of 61% when using egocentric camera data only, more than 44% when using egocentric camera and several stationary cameras data and more than 12% when using both inertial measurement unit (IMU) and egocentric camera data.

Keywords

Acknowledgement

The data used in this paper was obtained from kitchen.cs.cmu.edu and the data collection was funded in part by the National Science Foundation (Grant No. EEEC-0540865).

References

C. Jobanputra, J. Bavishi, and N. Doshi, "Human activity recognition: a survey," Procedia Computer Science, vol. 155, pp. 698-703, 2019. https://doi.org/10.1016/j.procs.2019.08.100
T. Alhersh, H. Stuckenschmidt, A. U. Rehman, and S. B. Belhaouari, "Learning human activity from visual data using deep learning," IEEE Access, vol. 9, pp. 106245-106253, 2021. https://doi.org/10.1109/ACCESS. 2021.3099567
Carnegie Mellon University, 'CMU-Multimodal Activity (CMU-MMAC) database," 2010 [Online]. Available: http://kitchen.cs.cmu.edu/main.php.
E. H. Spriggs, F. De La Torre, and M. Hebert, "Temporal segmentation and activity classification from first-person sensing," in Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Miami, FL, 2009, pp. 17-24. https://doi.org/10.1109/CVPRW.2009.5204354
A. Fathi, A. Farhadi, and J. M. Rehg, "Understanding egocentric activities," in Proceedings of 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 407-414. https://doi.org/10.1109/ ICCV.2011.6126269
A. Fathi, Y. Li, and J. Rehg, "Learning to recognize daily actions using gaze," in Computer Vision - ECCV 2012. Heidelberg, Germany: Springer, 2012, pp. 314-327. https://doi.org/10.1007/978-3-642-33718-5_23
H. Pirsiavash and D. Ramanan, "Detecting activities of daily living in first-person camera views," in Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 2847-2854. https://doi.org/10.1109/CVPR.2012.6248010
M. S. Ryoo and L. Matthies, "First-person activity recognition: What are they doing to me?," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, 2013, pp. 2730-2737. https://doi.org/10.1109/CVPR.2013.352
S. Song, V. Chandrasekhar, N. M. Cheung, S. Narayan, L. Li, and J. H. Lim, "Activity recognition in egocentric life logging videos," in Computer Vision - ACCV 2014 Workshops. Cham, Switzerland: Springer, 2014, pp. 445-458. https://doi.org/10.1007/978-3-319-16634-6_33
B. Soran, A. Farhadi, and L. Shapiro, "Action recognition in the presence of one egocentric and multiple static cameras," in Computer Vision - ACCV 2014. Cham, Switzerland: Springer, 2015, pp. 178-193. https://doi.org/10.1007/978-3-319-16814-2_12
M. S. Ryoo, B. Rothrock, and L. Matthies, "Pooled motion features for first-person videos," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 896-904. https://doi.org/10.1109/CVPR.2015.7298691
Y. Li, Z. Ye, and J. M. Rehg, "Delving into egocentric actions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 287-295. https://doi.org/10.1109/CVPR. 2015.7298625
M. Ma, H. Fan, and K. M. Kitani, "Going deeper into first-person activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 1894-1903. https://doi.org/10.1109/CVPR.2016.209
S. Song, N. M. Cheung, V. Chandrasekhar, B. Mandal, and J. Liri, "Egocentric activity recognition with multimodal fisher vector," in Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016, pp. 2717-2721. https://doi.org/10.1109/ICASSP.2016.7472171
S. Song, V. Chandrasekhar, B. Mandal, L. Li, J. H. Lim, G. Sateesh Babu, P. P. San, and N. M. Cheung, "Multimodal multi-stream deep learning for egocentric activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, 2016, pp. 24-31. https://doi.org/10.1109/CVPRW.2016.54
S. Singh, C. Arora, and C. V. Jawahar, "First person action recognition using deep learned descriptors," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 2620-2628. https://doi.org/10.1109/CVPR.2016.287
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, "Temporal segment networks: towards good practices for deep action recognition," in Computer Vision - ECCV 2016. Cham, Switzerland: Springer, 2016, pp. 20-36. https://doi.org/10.1007/978-3-319-46484-8_2
E. A. Khalid, A. Hamid, A. Brahim, and O. Mohammed, "A survey of activity recognition in egocentric lifelogging datasets," in Proceedings of 2017 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), Fez, Morocco, 2017, pp. 1-8. https://doi.org/10.1109/WITS.2017.7934659
S. Singh, C. Arora, and C. V. Jawahar, "Trajectory aligned features for first person action recognition," Pattern Recognition, vol. 62, pp. 45-55, 2017. https://doi.org/10.1016/j.patcog.2016.07.031
Y. Liu, P. Wei, and S. C. Zhu, "Jointly recognizing object fluents and tasks in egocentric videos," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2924-2932. https://doi.org/10.1109/ICCV.2017.318
R. Possas, S. P. Caceres, and F. Ramos, "Egocentric activity recognition on a budget," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 5967-5976. https://doi.org/10.1109/CVPR.2018.00625
Y. Li, M. Liu, and J. M. Rehg, "In the eye of beholder: joint learning of gaze and actions in first person video," in Computer Vision - ECCV 2018. Cham, Switzerland: Springer, 2018, pp. 619-635. https://doi.org/10. 1007/978-3-030-01228-1_38 https://doi.org/10.1007/978-3-030-01228-1_38
S. Sudhakaran and O. Lanz, "Attention is all we need: nailing down object-centric attention for egocentric activity recognition," 2018 [Online]. Available: https://arxiv.org/abs/1807.11794.
S. Sudhakaran, S. Escalera, and O. Lanz, "LSTA: long short-term attention for egocentric action recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA,2019, pp. 9954-9963. https://doi.org/10.1109/CVPR.2019.01019
E. Kazakos, A. Nagrani, A. Zisserman, and D. Damen, "EPIC-fusion: audio-visual temporal binding for egocentric action recognition," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019, pp. 5492-5501. https://doi.org/10.1109/ICCV.2019.00559
Y. Lu and S. Velipasalar, "Autonomous human activity classification from ego-vision camera and accelerometer data," 2019 [Online]. Available: https://arxiv.org/abs/1905.13533.
A. Diete and H. Stuckenschmidt, "Fusing object information and inertial data for activity recognition," Sensors, vol. 19, no. 19, article no. 4119, 2019. https://doi.org/10.3390/s19194119
A. Furnari and G. M. Farinella, "Rolling-unrolling LSTMs for action anticipation from first-person video," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 4021-4036, 2021. https://doi.org/10.1109/TPAMI.2020.2992889
I. Rodin, A. Furnari, D. Mavroeidis, and G. M. Farinella, "Scene understanding and interaction anticipation from first person vision," in Proceedings of the 1st Workshop on Smart Personal Health Interfaces co-located with 25th International Conference on Intelligent User Interfaces, Cagliari, Italy, 2020, pp. 78-83.
K. Min and J. J. Corso, "Integrating human gaze into attention for egocentric activity recognition," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, 2021, pp. 1069-1078. https://doi.org/10.1109/wacv48630.2021.00111
F. Ragusa, A. Furnari, S. Livatino, and G. M. Farinella, "The MECCANO dataset: understanding human-object interactions from egocentric videos in an industrial-like domain," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, 2021, pp. 1568-1577. https://doi.org/ 10.1109/wacv48630.2021.00161
A. Ghosh, A. Sufian, F. Sultana, A. Chakrabarti, and D. De, "Fundamental concepts of convolutional neural network," in Recent Trends and Advances in Artificial Intelligence and Internet of Things. Cham, Switzerland: Springer, 2020, pp. 519-567. https://doi.org/10.1007/978-3-030-32644-9_36
S. Saha, "A comprehensive guide to convolutional neural networks: the ELI5 way," 2018 [Online]. Available: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53.
DVDVideoSoft, "Free Video to JPG converter," 2022 [Online], Available: https://www.dvdvideosoft.com/products/dvd/Free-Video-to-JPG-Converter.htm.
MathWorks, "TrainingOptionsSGDM: training options for stochastic gradient descent with momentum," c2023 [Online]. Available: https://fr.mathworks.com/help/deeplearning/ref/nnet.cnn.trainingoptionssgdm.html;jsessionid=4a2aaa96a2ed0eec48f9cfd48951.
S. Shi, "On the hyperparameters in stochastic gradient descent with momentum," 2021 [Online]. Available: https://arxiv.org/abs/2108.03947.

Journal of Information Processing Systems

Egocentric Vision for Human Activity Recognition Using Deep Learning

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)