DOI QR코드

DOI QR Code

Real time instruction classification system

  • Sang-Hoon Lee (Korea Institute of Science and Technology) ;
  • Dong-Jin Kwon (Department of Computer Electronics Engineering, Seoil University)
  • Received : 2024.06.12
  • Accepted : 2024.06.26
  • Published : 2024.08.31

Abstract

A recently the advancement of society, AI technology has made significant strides, especially in the fields of computer vision and voice recognition. This study introduces a system that leverages these technologies to recognize users through a camera and relay commands within a vehicle based on voice commands. The system uses the YOLO (You Only Look Once) machine learning algorithm, widely used for object and entity recognition, to identify specific users. For voice command recognition, a machine learning model based on spectrogram voice analysis is employed to identify specific commands. This design aims to enhance security and convenience by preventing unauthorized access to vehicles and IoT devices by anyone other than registered users. We converts camera input data into YOLO system inputs to determine if it is a person, Additionally, it collects voice data through a microphone embedded in the device or computer, converting it into time-domain spectrogram data to be used as input for the voice recognition machine learning system. The input camera image data and voice data undergo inference tasks through pre-trained models, enabling the recognition of simple commands within a limited space based on the inference results. This study demonstrates the feasibility of constructing a device management system within a confined space that enhances security and user convenience through a simple real-time system model. Finally our work aims to provide practical solutions in various application fields, such as smart homes and autonomous vehicles.

Keywords

References

  1. O. A. Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn and D. Yu, " Convolution Neural Networks for Speech Recognition", Transactions on Autio Speech and Language(TASLP), vol. 22, pp. 1533-1545, 2014. DOI: https://doi.org/10.1109/TASLP.2014.2339736
  2. Juyoung Kim, Dai Yeol Yun, Oh Seko Kwon, Seok Jae Moon and CHio gon Hwang " Comparative Analysis of Speech Recognition Open API Error Rate" International Journal of Advanced Smart Convergence (IJASC), vol. 10, pp. 79-85, 2021
  3. J. Redimon, S. Divvala, R. Girshick and A. Farhadi, " You Only Look Once: Unified, Real-Time Object Detection", Computer Vision and Pattern Recognition (CVPR), ISSN. 1063-6919, pp. 779-788, 2016. DOI: https://doi.org/10.1109/CVPR.2016.91
  4. P. Warden, " Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition", arxiv:1804.03209, pp. 1-11, 2018. DOI: https://doi.org/10.48550/arXiv.1804.03209
  5. Y. Zhang, B. Li, H. Fang and Q. Meng, " Spectrogram Transformers for Audio Classification", International Sustainability Transitions(IST), 2022 DOI:https://doi.org/10.1109/IST55454.2022.9827729
  6. J. Redmon, A. Farhadi, "YOLO9000: Better, Faster, Stronger", Computer Vision and Pattern Recognition (CVPR), 2017. DOI:https://doi.org/10.1109/CVPR.2017.690
  7. J. Redmon, Ali Farhadi, "YOLOv3: An Incremental Improvement", arxiv: 1804.02767, pp. 1-6, 2018. DOI: https://doi.org/10.48550/arXiv.1804.02767
  8. J. Liang," Image classification based on RESNET", International Conference on Computer Information Science and Application Technology (CISAT), pp. 1-6, 2020. DOI: https://doi.org/10.1088/1742-6596/1634/1/012110
  9. Z. Zhong, M. Zheng, H. Mai, J. Zhao and X. Liu," Cancer image classification based on DenseNet model", International Conference on Artificial Intelligence Technologies and Application (ICAITA), pp. 1-6, 2020. DOI: https://doi.org/10.1088/1742-6596/1651/1/012143
  10. J. Wang, L. Yang, Z. Huo, W. He and J. Luo," Multi-Label Classification of Fundus Images With EfficientNet",Institute of Electrical and Electronics Engineers (IEEE), vol. 8, pp. 212499-212508, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3040275
  11. M. Tan and Q. V. Le," EfficientNet: Rethinking Model Scaling for Convolutional Neural Network",arxiv: 1905.11946, pp. 1-11, 2019. DOI: https://doi.org/10.48550/arXiv.1905.11946
  12. M. Halle and K. Stevens," Speech recognition: A model and a program for research", IRE Transactions on Information Theory, Vol. 8, pp. 155-159, 1962. DOI: https://doi.org/10.1109/TTT.1942.1057686