과제정보
The data used in this paper was obtained from kitchen.cs.cmu.edu and the data collection was funded in part by the National Science Foundation (Grant No. EEEC-0540865).
참고문헌
- C. Jobanputra, J. Bavishi, and N. Doshi, "Human activity recognition: a survey," Procedia Computer Science, vol. 155, pp. 698-703, 2019. https://doi.org/10.1016/j.procs.2019.08.100
- T. Alhersh, H. Stuckenschmidt, A. U. Rehman, and S. B. Belhaouari, "Learning human activity from visual data using deep learning," IEEE Access, vol. 9, pp. 106245-106253, 2021. https://doi.org/10.1109/ACCESS. 2021.3099567
- Carnegie Mellon University, 'CMU-Multimodal Activity (CMU-MMAC) database," 2010 [Online]. Available: http://kitchen.cs.cmu.edu/main.php.
- E. H. Spriggs, F. De La Torre, and M. Hebert, "Temporal segmentation and activity classification from first-person sensing," in Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Miami, FL, 2009, pp. 17-24. https://doi.org/10.1109/CVPRW.2009.5204354
- A. Fathi, A. Farhadi, and J. M. Rehg, "Understanding egocentric activities," in Proceedings of 2011 International Conference on Computer Vision, Barcelona, Spain, 2011, pp. 407-414. https://doi.org/10.1109/ ICCV.2011.6126269
- A. Fathi, Y. Li, and J. Rehg, "Learning to recognize daily actions using gaze," in Computer Vision - ECCV 2012. Heidelberg, Germany: Springer, 2012, pp. 314-327. https://doi.org/10.1007/978-3-642-33718-5_23
- H. Pirsiavash and D. Ramanan, "Detecting activities of daily living in first-person camera views," in Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, 2012, pp. 2847-2854. https://doi.org/10.1109/CVPR.2012.6248010
- M. S. Ryoo and L. Matthies, "First-person activity recognition: What are they doing to me?," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, 2013, pp. 2730-2737. https://doi.org/10.1109/CVPR.2013.352
- S. Song, V. Chandrasekhar, N. M. Cheung, S. Narayan, L. Li, and J. H. Lim, "Activity recognition in egocentric life logging videos," in Computer Vision - ACCV 2014 Workshops. Cham, Switzerland: Springer, 2014, pp. 445-458. https://doi.org/10.1007/978-3-319-16634-6_33
- B. Soran, A. Farhadi, and L. Shapiro, "Action recognition in the presence of one egocentric and multiple static cameras," in Computer Vision - ACCV 2014. Cham, Switzerland: Springer, 2015, pp. 178-193. https://doi.org/10.1007/978-3-319-16814-2_12
- M. S. Ryoo, B. Rothrock, and L. Matthies, "Pooled motion features for first-person videos," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 896-904. https://doi.org/10.1109/CVPR.2015.7298691
- Y. Li, Z. Ye, and J. M. Rehg, "Delving into egocentric actions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, 2015, pp. 287-295. https://doi.org/10.1109/CVPR. 2015.7298625
- M. Ma, H. Fan, and K. M. Kitani, "Going deeper into first-person activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 1894-1903. https://doi.org/10.1109/CVPR.2016.209
- S. Song, N. M. Cheung, V. Chandrasekhar, B. Mandal, and J. Liri, "Egocentric activity recognition with multimodal fisher vector," in Proceedings of 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 2016, pp. 2717-2721. https://doi.org/10.1109/ICASSP.2016.7472171
- S. Song, V. Chandrasekhar, B. Mandal, L. Li, J. H. Lim, G. Sateesh Babu, P. P. San, and N. M. Cheung, "Multimodal multi-stream deep learning for egocentric activity recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, 2016, pp. 24-31. https://doi.org/10.1109/CVPRW.2016.54
- S. Singh, C. Arora, and C. V. Jawahar, "First person action recognition using deep learned descriptors," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, 2016, pp. 2620-2628. https://doi.org/10.1109/CVPR.2016.287
- L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool, "Temporal segment networks: towards good practices for deep action recognition," in Computer Vision - ECCV 2016. Cham, Switzerland: Springer, 2016, pp. 20-36. https://doi.org/10.1007/978-3-319-46484-8_2
- E. A. Khalid, A. Hamid, A. Brahim, and O. Mohammed, "A survey of activity recognition in egocentric lifelogging datasets," in Proceedings of 2017 International Conference on Wireless Technologies, Embedded and Intelligent Systems (WITS), Fez, Morocco, 2017, pp. 1-8. https://doi.org/10.1109/WITS.2017.7934659
- S. Singh, C. Arora, and C. V. Jawahar, "Trajectory aligned features for first person action recognition," Pattern Recognition, vol. 62, pp. 45-55, 2017. https://doi.org/10.1016/j.patcog.2016.07.031
- Y. Liu, P. Wei, and S. C. Zhu, "Jointly recognizing object fluents and tasks in egocentric videos," in Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017, pp. 2924-2932. https://doi.org/10.1109/ICCV.2017.318
- R. Possas, S. P. Caceres, and F. Ramos, "Egocentric activity recognition on a budget," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, 2018, pp. 5967-5976. https://doi.org/10.1109/CVPR.2018.00625
- Y. Li, M. Liu, and J. M. Rehg, "In the eye of beholder: joint learning of gaze and actions in first person video," in Computer Vision - ECCV 2018. Cham, Switzerland: Springer, 2018, pp. 619-635. https://doi.org/10. 1007/978-3-030-01228-1_38 https://doi.org/10.1007/978-3-030-01228-1_38
- S. Sudhakaran and O. Lanz, "Attention is all we need: nailing down object-centric attention for egocentric activity recognition," 2018 [Online]. Available: https://arxiv.org/abs/1807.11794.
- S. Sudhakaran, S. Escalera, and O. Lanz, "LSTA: long short-term attention for egocentric action recognition," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA,2019, pp. 9954-9963. https://doi.org/10.1109/CVPR.2019.01019
- E. Kazakos, A. Nagrani, A. Zisserman, and D. Damen, "EPIC-fusion: audio-visual temporal binding for egocentric action recognition," in Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019, pp. 5492-5501. https://doi.org/10.1109/ICCV.2019.00559
- Y. Lu and S. Velipasalar, "Autonomous human activity classification from ego-vision camera and accelerometer data," 2019 [Online]. Available: https://arxiv.org/abs/1905.13533.
- A. Diete and H. Stuckenschmidt, "Fusing object information and inertial data for activity recognition," Sensors, vol. 19, no. 19, article no. 4119, 2019. https://doi.org/10.3390/s19194119
- A. Furnari and G. M. Farinella, "Rolling-unrolling LSTMs for action anticipation from first-person video," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 11, pp. 4021-4036, 2021. https://doi.org/10.1109/TPAMI.2020.2992889
- I. Rodin, A. Furnari, D. Mavroeidis, and G. M. Farinella, "Scene understanding and interaction anticipation from first person vision," in Proceedings of the 1st Workshop on Smart Personal Health Interfaces co-located with 25th International Conference on Intelligent User Interfaces, Cagliari, Italy, 2020, pp. 78-83.
- K. Min and J. J. Corso, "Integrating human gaze into attention for egocentric activity recognition," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, 2021, pp. 1069-1078. https://doi.org/10.1109/wacv48630.2021.00111
- F. Ragusa, A. Furnari, S. Livatino, and G. M. Farinella, "The MECCANO dataset: understanding human-object interactions from egocentric videos in an industrial-like domain," in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, 2021, pp. 1568-1577. https://doi.org/ 10.1109/wacv48630.2021.00161
- A. Ghosh, A. Sufian, F. Sultana, A. Chakrabarti, and D. De, "Fundamental concepts of convolutional neural network," in Recent Trends and Advances in Artificial Intelligence and Internet of Things. Cham, Switzerland: Springer, 2020, pp. 519-567. https://doi.org/10.1007/978-3-030-32644-9_36
- S. Saha, "A comprehensive guide to convolutional neural networks: the ELI5 way," 2018 [Online]. Available: https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53.
- DVDVideoSoft, "Free Video to JPG converter," 2022 [Online], Available: https://www.dvdvideosoft.com/products/dvd/Free-Video-to-JPG-Converter.htm.
- MathWorks, "TrainingOptionsSGDM: training options for stochastic gradient descent with momentum," c2023 [Online]. Available: https://fr.mathworks.com/help/deeplearning/ref/nnet.cnn.trainingoptionssgdm.html;jsessionid=4a2aaa96a2ed0eec48f9cfd48951.
- S. Shi, "On the hyperparameters in stochastic gradient descent with momentum," 2021 [Online]. Available: https://arxiv.org/abs/2108.03947.