DOI QR코드

DOI QR Code

Automatic Gesture Recognition for Human-Machine Interaction: An Overview

  • Nataliia, Konkina (Department of Automation of Power Processes and Systems Engineering (APEPS), Faculty of heat power engineering, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute")
  • Received : 2021.12.05
  • Published : 2022.01.30

Abstract

With the increasing reliance of computing systems in our everyday life, there is always a constant need to improve the ways users can interact with such systems in a more natural, effective, and convenient way. In the initial computing revolution, the interaction between the humans and machines have been limited. The machines were not necessarily meant to be intelligent. This begged for the need to develop systems that could automatically identify and interpret our actions. Automatic gesture recognition is one of the popular methods users can control systems with their gestures. This includes various kinds of tracking including the whole body, hands, head, face, etc. We also touch upon a different line of work including Brain-Computer Interface (BCI), Electromyography (EMG) as potential additions to the gesture recognition regime. In this work, we present an overview of several applications of automated gesture recognition systems and a brief look at the popular methods employed.

Keywords

References

  1. Khan, U.M., Kabir, Z., Hassan, S. A., Ahmed, S. H.: A Deep Learning Framework Using Passive WiFi Sensing for Respiration Monitoring. In: GLOBECOM 2017 - 2017 IEEE Global Communications Conference, pp. 1-6, doi: 10.1109/GLOCOM.2017.8255027 (2017)
  2. Zhao, M., Li T., Alsheikh, M.A., Tian Y., Zhao H., Torralba A., Katabi D.: Through-Wall Human Pose Estimation Using Radio Signals. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7356-7365, doi: 10.1109/CVPR.2018.00768 (2018)
  3. Dix, A.: Human-Computer Interaction. In: Encyclopedia of Database Systems, pp. 1734-1739. New York, NY: Springer New York (2018).
  4. Mitra, S., & Acharya, T.: Gesture Recognition: A Survey. In: IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews: A Publication of the IEEE Systems, Man, and Cybernetics Society, vol. 37(3), pp. 311-324. doi:10.1109/tsmcc.2007.893280 (2007).
  5. Sarkar, A.R., Sanyal, G., Majumder, S.: Hand Gesture Recognition Systems: A Survey. In: International Journal of Computer Applications (2013)
  6. Bansode, R., Pashte, S., Sawant, S., Sabnis, S.K.: Drowsy Driver Detection System. In: International Journal for Scientific Research & Development, vol. 5, no. 2, pp. 2134-2137 (2016)
  7. Janveja, I., Nambi, A., Bannur, S., Gupta, S., & Padmanabhan, V.: InSight: Monitoring the state of the driver in low-light using smartphones. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 4(3), pp.1-29. doi:10.1145/3411819 (2020).
  8. Low, T., Bubalo, N., Gossen, T., Kotzyba, M., Brechmann, A, Huckauf, A., Nurnberger, A.: Towards Identifying User Intentions in Exploratory Search using Gaze and Pupil Tracking. In: Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval (CHIIR '17). Association for Computing Machinery, New York, NY, USA, https://doi.org/10.1145/3020165.3022131 (2017).
  9. Chen, Y., Tian, Y., He, M.: Monocular Human Pose Estimation: A Survey of Deep Learning-based Methods. In: Computer Vision and Image Understanding (CVIU), vol. 192, https://doi.org/10.1016/j.cviu.2019.102897 (2020).
  10. Zheng, C., Wu, W., Yang, T., Zhu, S., Chen, C., Liu, ., Shen, J., Kehtarnavaz, N., Shah, M.: Deep Learning-Based Human Pose Estimation: A Survey. Arxiv Preprint (2021)
  11. Marinoiu, E., Papava, D., & Sminchisescu, C.: Pictorial human spaces: How well do humans perceive a 3D articulated pose? In: 2013 IEEE International Conference on Computer Vision (2013)
  12. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., & Black, M. J.: SMPL: A skinned multi-person linear model. In: ACM Transactions on Graphics, vol. 34(6), pp. 1-16 (2015)
  13. Pons-Moll, G., Romero, J., Mahmood, N., & Black, M. J.: Dyna: A model of dynamic human shape in motion. In: ACM Transactions on Graphics, vol. 34(4), pp. 1-14 (2015)
  14. Zuffi, S., Black, M.J.: The Stitched Puppet: A Graphical Model of 3D Human Shape and Pose. In: Computer Vision and Pattern Recognition (CVPR) (2015).
  15. Joo, H., Simon, T., Sheikh, Y.: Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies. In: Computer Vision and Pattern Recognition (CVPR) (2018).
  16. Xu, H., Bazavan, E. G., Zanfir, A., Freeman, W. T., Sukthankar, R., Sminchisescu, C.: Ghum & ghuml: Generative 3d human shape and articulated pose models. In: Computer Vision and Pattern Recognition (CVPR) (2020).
  17. Pfister T., Simonyan K., Charles J., Zisserman A.: Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos. In: (eds) Computer Vision - ACCV 2014. ACCV 2014. Lecture Notes in Computer Science, vol. 9003. Springer, Cham. https://doi.org/10.1007/978-3-319-16865-4_35 (2015).
  18. Li, S., Liu, Z.Q., Chan, A.B.: Heterogeneous multi-task learning for human pose estimation with deep convolutional neural network. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 482-489 (2014).
  19. Gkioxari, G., Hariharan, B., Girshick, R., Malik, J.: R-cnns for pose estimation and action detection. In: arXiv preprint arXiv:1406.5212 (2014).
  20. Fan, X., Zheng, K., Lin, Y., Wang, S.: Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation. In: arXiv preprint arXiv:1504.07159 (2015)
  21. Luvizon, D.C., Picard, D., Tabia, H.: 2D/3D pose estimation and action recognition using multitask deep learning. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5137-5146 (2018)
  22. Luvizon, D.C., Tabia, H., Picard, D.: Human pose regression by combining indirect part detection and contextual information. In: arXiv preprint arXiv:1710.02322 (2017)
  23. Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. In: arXiv preprint arXiv:1801.07372 (2018)
  24. Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4733-4742 (2016)
  25. Sun, X., Shang, J., Liang, S., Wei, Y.: Compositional human pose regression. In: Proc. IEEE International Conference on Computer Vision, p. 7 (2017)
  26. Holland, A. C., O'Connell, G., & Dziobek, I.: Facial mimicry, empathy, and emotion recognition: a meta-analysis of correlations. In: Cognition & Emotion, vol. 35(1), pp.150-168. (2021). https://doi.org/10.1080/02699931.2020.1815655
  27. Cuncic, A.: How to better understand facial expressions. In: Verywell Mind. Retrieved December 19, 2021, from https://www.verywellmind.com/understanding-emotionsthrough-facial-expressions-3024851 (March 30, 2021)
  28. Cowen, A.S., Keltner, D., Schroff, F., Jou, B., Adam, H., Prasad G.: Sixteen facial expressions occur in similar contexts worldwide. In: Nature, vol.589(7841), pp. 251-257 (2021) https://doi.org/10.1038/s41586-020-3037-7
  29. Nogales, R.E., Benalcazar, M.E.: Hand gesture recognition using machine learning and infrared information: a systematic literature review. In: International Journal of Machine Learning and Cybernetics, vol. 12, pp. 2859-2886 (2021) https://doi.org/10.1007/s13042-021-01372-y
  30. Luo, W., Schwing, A. G., & Urtasun, R.: Efficient deep learning for stereo matching. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
  31. Hamid, M. S., Fajar, N., Manap, A., Hamzah R.A., Kadmin A.F.: Stereo matching algorithm based on deep learning: A survey. In: Journal of King Saud University - Computer and Information Sciences (2020)
  32. Rao, N., Surana, P.M, Ragesh, R., Srinivasa G.: Analysis of Joints for Tracking Fitness and Monitoring Progress in Physiotherapy. In: The Proceedings of IEEE International Conference on Signal and Image Processing Applications (IEEE ICSIPA 2019), Malaysia (2019)
  33. Yang, W., Peng, Y., & Xie, H.: Action Recognition Based on Kinect Deep Learning. In: Journal of Frontiers of Society, Science and Technology, vol. 1(2), pp.11-15 (2021)
  34. Anson, D., Brandon, C., Davis, A., Hill, M., Michalik, B., & Sennett, C.: Swype vrs. conventional on-screen keyboards: Efficacy compared. In: RESNA Annual Conference (2012)
  35. Shokat, S., Riaz, R., Rizvi, S. S., Abbasi, A. M., Abbasi, A. A., & Kwon, S. J.: Deep learning scheme for character prediction with position-free touch screen-based Braille input method. In: Human-Centric Computing and Information Sciences, vol. 10(1), pp. 1-24 (2020) https://doi.org/10.1186/s13673-019-0205-6
  36. Lu, D., Yu, Y., & Liu, H.: Gesture recognition using data glove: An extreme learning machine method. In : 2016 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, pp. 1349-1354 (2016) https://doi.org/10.1109/robio.2016.7866514
  37. Bablani, A., Edla, D. R., Tripathi, D., & Cheruku, R.: Survey on brain-computer interface: An emerging computational intelligence paradigm. In: ACM Computing Surveys, vol. 52(1), pp.1-32 (2019)
  38. Jaramillo, A. G., & Benalcazar, M. E.: Real-time hand gesture recognition with EMG using machine learning. In: 2017 IEEE Second Ecuador Technical Chapters Meeting (ETCM). pp. 1-5 (2017)
  39. Baldeon, K., Onate, W., & Caiza, G.: Augmented reality for learning sign language using Kinect tool. In: Smart Innovation, Systems and Technologies, pp. 447-457. Springer Singapore (2021)
  40. Ergurel, D.: Leap Motion announces $50 million in Series C funding. In: Haptical. https://haptic.al/leap-motionannounces-50-million-in-series-c-funding-a1a1f8c0440a (2017, July 18)
  41. Hayward, A.: Nintendo Wii U review. In: TechRadar. https://www.techradar.com/reviews/gaming/gamesconsoles/nintendo-wii-u-1084120/review (2015, December 1)
  42. Grover, S.: Myo gesture armband. In: CyberGeeks. https://cybergeeks.in/myo-armband/ (2014, December 30)
  43. Data Glove_Products & Solutions_Goertek. In: Goertek.Com. Retrieved December 19, 2021, from https://www.goertek.com/en/content/details62_16718.html (n.d.)
  44. Koles, Z.J, Lazar, M.S, Zhou, S.Z.: Spatial patterns underlying population differences in the background EEG. In: Brain Topography, vol. 2(4), pp. 275-284 (1990) https://doi.org/10.1007/BF01129656
  45. Boye, A. T., Kristiansen, U. Q., Billinger, M., Nascimento, O. F. do, & Farina, D.: Identification of movement-related cortical potentials with optimized spatial filtering and principal component analysis. In: Biomedical Signal Processing and Control, vol.3(4), pp.300-304 (2008) https://doi.org/10.1016/j.bspc.2008.05.001
  46. Andersen, A. H., Gash, D. M., & Avison, M. J.: Principal component analysis of the dynamic response measured by fMRI: a generalized linear systems framework. In: Magnetic Resonance Imaging, vol.17(6), pp.795-815 (1999) https://doi.org/10.1016/S0730-725X(99)00028-4
  47. Herault, J., Jutten, C., Denker, J.S.: Space or time adaptive signal processing by neural network models. In: AIP Conference Proceedings, vol. 151, pp. 206-211 (1986)
  48. Xu, N., Gao, X., Hong, B., Miao, X., Gao, S., Yang, F. BCI competition 2003-dataset IIb: Enhancing P300 wave detection using ICA-based subspace projections for BCI applications. In: IEEE Transactions on Biomedical Engineering, vol.51(6), pp.1067-1072 (2004) https://doi.org/10.1109/tbme.2004.826699
  49. Bell, A.J, Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. In: Neural Computation, vol.7(6), pp. 1129-1159 (1995) https://doi.org/10.1162/neco.1995.7.6.1129
  50. Delorme, A., & Makeig, S.: EEG changes accompanying learned regulation of 12-Hz EEG activity. In: IEEE Transactions on Neural Systems and Rehabilitation Engineering: A Publication of the IEEE Engineering in Medicine and Biology Society, vol.11(2), pp.133-137 (2003) https://doi.org/10.1109/TNSRE.2003.814428
  51. Kanoga, S., Nakanishi, M., & Mitsukura, Y.: Assessing the effects of voluntary and involuntary eyeblinks in independent components of electroencephalogram. In: Neurocomputing, vol.193, pp. 20-32 (2016) https://doi.org/10.1016/j.neucom.2016.01.057
  52. Ting, W., Guo-zheng, Y., Bang-hua, Y., & Hong, S.: EEG feature extraction based on wavelet packet decomposition for brain computer interface. In: Measurement: Journal of the International Measurement Confederation, vol.41(6), pp. 618-625 (2008) https://doi.org/10.1016/j.measurement.2007.07.007
  53. Yang, B.-H., Yan, G.-Z., Wu, T., & Yan, R.-G.: Subject-based feature extraction using fuzzy wavelet packet in brain-computer interfaces. In: Signal Processing, vol.87(7), pp. 1569-1574 (2007) https://doi.org/10.1016/j.sigpro.2006.12.018
  54. Wang, X., Xia, M., Cai, H., Gao, Y., & Cattani, C.: Hidden-Markov-Models-based dynamic hand gesture recognition. In: Mathematical Problems in Engineering, pp. 1-11 (2012)
  55. Yamato, J., Ohya, J., Ishii, K.: Recognizing human action in time sequential images using hidden Markov model. In: Proc. IEEE Int. Conf. Comput. Vis. Pattern Recogn., Champaign, IL, pp. 379-385 (1992)
  56. Starner, T., & Pentland, A. Real-time american sign language recognition from video using hidden markov models. In: Motion-based recognition. Springer, Dordrecht, pp. 227-243 (1997)
  57. Starner, T., Weaver, J., & Pentland, A.: Real-time American sign language recognition using desk and wearable computer based video. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 12, pp. 1371-1378 (1998) https://doi.org/10.1109/34.735811
  58. Isard, M., Blake A.: CONDENSATION -- conditional density propagation for visual tracking. In: Int. J. Comput. Vis., vol. 1, pp. 5-28 (1998) https://doi.org/10.1007/BF00128524
  59. Black, M. J., Jepson, A. D.: A probabilistic framework for matching temporal trajectories: Condensation-based recognition of gestures and expressions. In: Proc. 5th Eur. Conf. Comput. Vis., vol. 1, pp. 909-924 (1998)
  60. Davis, J., Shah, M.: Visual gesture recognition. In: Vis., Image Signal Process., vol. 141, pp. 101-106 (1994) https://doi.org/10.1049/ip-vis:19941058
  61. Hong, P., Turk M., Huang, T. S.: Gesture modeling and recognition using finite state machines. In: Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recogn., Grenoble, France, pp. 410-415 (2000)
  62. Yeasin, M., Chaudhuri, S.: Visual understanding of dynamic hand gestures. In: Pattern Recogn., vol. 33, pp. 1805-1817, (2000) https://doi.org/10.1016/S0031-3203(99)00175-2
  63. Tur, A. O., & Keles, H. Y.: Evaluation of hidden Markov models using deep CNN features in isolated sign recognition. In: Multimedia Tools and Applications, vol. 80(13), pp. 19137-19155 (2021) https://doi.org/10.1007/s11042-021-10593-w
  64. Pigou, L., Dieleman, S., Kindermans, P.J., Schrauwen, B.: Sign language recognition using convolutional neural networks. In: Workshop at the European Conference on Computer Vision, pp. 572-578. Springer (2014)
  65. Nishida, N., Nakayama, H.: Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network. In: Image and Video Technology, Lecture Notes in Computer Science, pp. 682-694. Springer International Publishing, Cham (2016)
  66. Nunez, J. C., Cabido, R., Pantrigo, J. J., Montemayor, A. S., & Velez, J. F.: Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition. Pattern Recognition, vol. 76, pp.80-94 (2018) https://doi.org/10.1016/j.patcog.2017.10.033
  67. Zheng, Z., Chen, Z., Hu, F., Zhu, J., Tang, Q., Liang, Y.: An Automatic Diagnosis of Arrhythmias Using a Combination of CNN and LSTM Technology. In: Electronics, vol.9(1), p.121 (2020) https://doi.org/10.3390/electronics9010121
  68. Fenghour, S., Chen, D., Guo, K., Li, B., & Xiao, P. Deep learning-based automated lip-reading: A survey. IEEE Access: Practical Innovations, Open Solutions, vol. 9, pp. 121184-121205 (2021)
  69. Trachuk, T., Vdovichena, O., Andriushchenko, M., Semenda, O., Pashkevych, M.: Branding and Advertising on Social Networks: Current Trends. In: IJCSNS International Journal of Computer Science and Network Security, vol.21 no.4, pp. 178-185 (2021)
  70. Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C.: Learning human pose estimation features with convolutional networks. In: arXiv preprint arXiv:1312.7302 (2013)
  71. Jain, A., Tompson, J., LeCun, Y., Bregler, C. Modeep.: A deep learning framework using motion features for human pose estimation. In: Proc. Asian conference on computer vision, Springer. pp. 302-315 (2014)
  72. Tang, Z., Peng, X., Geng, S., Wu, L., Zhang, S., Metaxas, D.: Quantized densely connected u-nets for efficient landmark localization. In: Proc. European Conference on Computer Vision, pp. 339-354 (2018)
  73. Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 648-656c (2015)
  74. Tompson, J.J., Jain, A., LeCun, Y., Bregler, C.: Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp. 1799-1807 (2014)
  75. Iqbal, U., Milan, A., & Gall, J.: PoseTrack: Joint Multi-person Pose Estimation and Tracking. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
  76. Fang, H., Xie, S., Tai, Y.W., Lu, C.: Rmpe: Regional multi-person pose estimation. In: Proc. IEEE International Conference on Computer Vision, pp. 2334-2343 (2017)
  77. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P.V., Schiele, B.: Deepcut: Joint subset partition and labeling for multiperson pose estimation. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929-4937 (2016)
  78. Insafutdinov, E., Pishchulin, L., Andres, B., Andriluka, M., Schiele, B.: Deepercut: A deeper, stronger, and faster multi-person pose estimation model. In: Proc. European Conference on Computer Vision, Springer. pp. 34-50 (2016)