Real time instruction classification system

Sang-Hoon Lee;Dong-Jin Kwon;

doi:10.7236/IJIBC.2024.16.3.212

International Journal of Internet, Broadcasting and Communication

Volume 16 Issue 3
/
Pages.212-220
/
2024
/
2288-4920(pISSN)
/
2288-4939(eISSN)

The Institute of Internet, Broadcasting and Communication (한국인터넷방송통신학회)

DOI QR Code

Real time instruction classification system

Sang-Hoon Lee (Korea Institute of Science and Technology) ;
Dong-Jin Kwon (Department of Computer Electronics Engineering, Seoil University)

Received : 2024.06.12
Accepted : 2024.06.26
Published : 2024.08.31

https://doi.org/10.7236/IJIBC.2024.16.3.212 Citation PDF

Download PDF

⟨ Previous Next ⟩

Abstract

A recently the advancement of society, AI technology has made significant strides, especially in the fields of computer vision and voice recognition. This study introduces a system that leverages these technologies to recognize users through a camera and relay commands within a vehicle based on voice commands. The system uses the YOLO (You Only Look Once) machine learning algorithm, widely used for object and entity recognition, to identify specific users. For voice command recognition, a machine learning model based on spectrogram voice analysis is employed to identify specific commands. This design aims to enhance security and convenience by preventing unauthorized access to vehicles and IoT devices by anyone other than registered users. We converts camera input data into YOLO system inputs to determine if it is a person, Additionally, it collects voice data through a microphone embedded in the device or computer, converting it into time-domain spectrogram data to be used as input for the voice recognition machine learning system. The input camera image data and voice data undergo inference tasks through pre-trained models, enabling the recognition of simple commands within a limited space based on the inference results. This study demonstrates the feasibility of constructing a device management system within a confined space that enhances security and user convenience through a simple real-time system model. Finally our work aims to provide practical solutions in various application fields, such as smart homes and autonomous vehicles.

Keywords

References

O. A. Hamid, A. Mohamed, H. Jiang, L. Deng, G. Penn and D. Yu, " Convolution Neural Networks for Speech Recognition", Transactions on Autio Speech and Language(TASLP), vol. 22, pp. 1533-1545, 2014. DOI: https://doi.org/10.1109/TASLP.2014.2339736
Juyoung Kim, Dai Yeol Yun, Oh Seko Kwon, Seok Jae Moon and CHio gon Hwang " Comparative Analysis of Speech Recognition Open API Error Rate" International Journal of Advanced Smart Convergence (IJASC), vol. 10, pp. 79-85, 2021
J. Redimon, S. Divvala, R. Girshick and A. Farhadi, " You Only Look Once: Unified, Real-Time Object Detection", Computer Vision and Pattern Recognition (CVPR), ISSN. 1063-6919, pp. 779-788, 2016. DOI: https://doi.org/10.1109/CVPR.2016.91
P. Warden, " Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition", arxiv:1804.03209, pp. 1-11, 2018. DOI: https://doi.org/10.48550/arXiv.1804.03209
Y. Zhang, B. Li, H. Fang and Q. Meng, " Spectrogram Transformers for Audio Classification", International Sustainability Transitions(IST), 2022 DOI:https://doi.org/10.1109/IST55454.2022.9827729
J. Redmon, A. Farhadi, "YOLO9000: Better, Faster, Stronger", Computer Vision and Pattern Recognition (CVPR), 2017. DOI:https://doi.org/10.1109/CVPR.2017.690
J. Redmon, Ali Farhadi, "YOLOv3: An Incremental Improvement", arxiv: 1804.02767, pp. 1-6, 2018. DOI: https://doi.org/10.48550/arXiv.1804.02767
J. Liang," Image classification based on RESNET", International Conference on Computer Information Science and Application Technology (CISAT), pp. 1-6, 2020. DOI: https://doi.org/10.1088/1742-6596/1634/1/012110
Z. Zhong, M. Zheng, H. Mai, J. Zhao and X. Liu," Cancer image classification based on DenseNet model", International Conference on Artificial Intelligence Technologies and Application (ICAITA), pp. 1-6, 2020. DOI: https://doi.org/10.1088/1742-6596/1651/1/012143
J. Wang, L. Yang, Z. Huo, W. He and J. Luo," Multi-Label Classification of Fundus Images With EfficientNet",Institute of Electrical and Electronics Engineers (IEEE), vol. 8, pp. 212499-212508, 2020. DOI: https://doi.org/10.1109/ACCESS.2020.3040275
M. Tan and Q. V. Le," EfficientNet: Rethinking Model Scaling for Convolutional Neural Network",arxiv: 1905.11946, pp. 1-11, 2019. DOI: https://doi.org/10.48550/arXiv.1905.11946
M. Halle and K. Stevens," Speech recognition: A model and a program for research", IRE Transactions on Information Theory, Vol. 8, pp. 155-159, 1962. DOI: https://doi.org/10.1109/TTT.1942.1057686

International Journal of Internet, Broadcasting and Communication

Real time instruction classification system

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)