• Title/Summary/Keyword: Voice recognition system

Search Result 332, Processing Time 0.032 seconds

Design of an Visitor Identification system for the Front Door of an Apartment using Deep learning (딥러닝 기반 이용한 공동주택현관문의 출입자 식별 시스템 설계)

  • Lee, Min-Hye;Mun, Hyung-Jin
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.45-51
    • /
    • 2022
  • Fear of contact exists due to the prevention of the spread of infectious diseases such as COVID-19. When using the common entrance door of an apartment, access is possible only if the resident enters a password or obtains the resident's permission. There is the inconvenience of having to manually enter the number and password for the common entrance door to enter. Also, contactless entry is required due to COVID-19. Due to the development of ICT, users can be easily identified through the development of face recognition and voice recognition technology. The proposed method detects a visitor's face through a CCTV or camera attached to the common entrance door, recognizes the face, and identifies it as a registered resident. Then, based on the registered information of the resident, it is possible to operate without contact by interworking with the elevator on the server. In particular, if face recognition fails with a hat or mask, the visitor is identified by voice or additional authentication of the visitor is performed based on the voice message. It is possible to block the spread of contagiousness without leaving any contactless function and fingerprint information when entering and exiting the front door of an apartment house, and without the inconvenience of access.

음성인식 기반 인터렉티브 미디어아트의 연구 - 소리-시각 인터렉티브 설치미술 "Water Music" 을 중심으로-

  • Lee, Myung-Hak;Jiang, Cheng-Ri;Kim, Bong-Hwa;Kim, Kyu-Jung
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.354-359
    • /
    • 2008
  • This Audio-Visual Interactive Installation is composed of a video projection of a video Projection and digital Interface technology combining with the viewer's voice recognition. The Viewer can interact with the computer generated moving images growing on the screen by blowing his/her breathing or making sound. This symbiotic audio and visual installation environment allows the viewers to experience an illusionistic spacephysically as well as psychologically. The main programming technologies used to generate moving water waves which can interact with the viewer in this installation are visual C++ and DirectX SDK For making water waves, full-3D rendering technology and particle system were used.

  • PDF

A Study on the Development of Korea Telecom Automatic Voice Recognition System (음성인식에 의한 연구센타 부서안내 시스팀 개발에 관한 연구)

  • Koo, Myoung-Wan;Sohn, Il-Hyun;Doh, Sam-Joo;Lee, Jong-Rak
    • Annual Conference on Human and Language Technology
    • /
    • 1992.10a
    • /
    • pp.185-192
    • /
    • 1992
  • 이 논문에서는 음성인식기술을 이용한 연구센타 부서안내 시스팀(KARS:Korea Telecom Automatic voice Recognition system)에 대하여 기술하였다. 이 시스팀은 기본적으로 음성응답 시스팀과 유사하지만 명령입력을 위해 푸시버튼 대신 음성을 이용한다는 점이 다르다. 사용자가 마이크로폰을 통해 음성명령을 입력하면, 이 시스팀은 사용자의 음성명령을 인식하여 연구센타내 각 부서의 간략한 소개, 전화번호 및 위치를 안내해 준다. 이 시스팀은 HMM(Hidden Markov Model)을 이용하는 화자독립 격리단어 인식시스팀으로서 116개의 부서이름과 7개의 제어용 단어로 구성되어 있는 123개 단어를 인식할 수 있다. 이 시스팀은 음소와 유사한 한국어 서브워드(subword)를 HMM의 기본단위로 사용하며 인식 실험결과 98.6%의 인식율을 얻을 수 있었다.

  • PDF

Development of Joystick & Speech Recognition Moving Machine Control System (조이스틱 및 음성인식 겸용 이동기제어시스템 개발)

  • Lee, Sang-Bae;Kang, Sung-In
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.1
    • /
    • pp.52-57
    • /
    • 2007
  • This paper presents the design of intelligent moving machine control system using a real time speech recognition. The proposed moving machine control system is composed of four separated module, which are main control module, speech recognition module, servo motor driving module and sensor module. In main control module with microprocessor(80C196KC), one part of the artificial intelligences, fuzzy logic, was applied to the proposed intelligent control system. In order to improve the non-linear characteristic which depend on an user's weight and variable environment, encoder attached to the servo motors was used for feedback control. The proposed system is tested using 9 words lot control of the mobile robot, and the performance of a mobile robot using voice and joystick command is also evaluated.

Indexing and Retrieval of Human Individuals on Video Data Using Face and Speaker Recognition

  • Y.Sugiyama;N.Ishikawa;M.Nishida;Y.Ariki
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1998.06b
    • /
    • pp.122-127
    • /
    • 1998
  • In this paper, we focus on the information retrieval of human individuals who are recorded on the video database. Our purpose is to index persons by their faces or voice and to retrieve their existing time sections on the video data. The database system can track as well as extract a face or voice of a certain person and construct a model of the individual person in self-organization mode. If he appears again at different time, the system can put the mark of the same person to the associated frames. In this way, the same person can be retrieved even if the system does not know his exact name. As the face and speaker modeling, a subspace method is employed to improve the indexing accuracy.

  • PDF

A Study on the Recognition of Face Based on CNN Algorithms (CNN 알고리즘을 기반한 얼굴인식에 관한 연구)

  • Son, Da-Yeon;Lee, Kwang-Keun
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.2
    • /
    • pp.15-25
    • /
    • 2017
  • Recently, technologies are being developed to recognize and authenticate users using bioinformatics to solve information security issues. Biometric information includes face, fingerprint, iris, voice, and vein. Among them, face recognition technology occupies a large part. Face recognition technology is applied in various fields. For example, it can be used for identity verification, such as a personal identification card, passport, credit card, security system, and personnel data. In addition, it can be used for security, including crime suspect search, unsafe zone monitoring, vehicle tracking crime.In this thesis, we conducted a study to recognize faces by detecting the areas of the face through a computer webcam. The purpose of this study was to contribute to the improvement in the accuracy of Recognition of Face Based on CNN Algorithms. For this purpose, We used data files provided by github to build a face recognition model. We also created data using CNN algorithms, which are widely used for image recognition. Various photos were learned by CNN algorithm. The study found that the accuracy of face recognition based on CNN algorithms was 77%. Based on the results of the study, We carried out recognition of the face according to the distance. Research findings may be useful if face recognition is required in a variety of situations. Research based on this study is also expected to improve the accuracy of face recognition.

Usability Test Guidelines for Speech-Oriented Multimodal User Interface (음성기반 멀티모달 사용자 인터페이스의 사용성 평가 방법론)

  • Hong, Ki-Hyung
    • MALSORI
    • /
    • no.67
    • /
    • pp.103-120
    • /
    • 2008
  • Basic components for multimodal interface, such as speech recognition, speech synthesis, gesture recognition, and multimodal fusion, have their own technological limitations. For example, the accuracy of speech recognition decreases for large vocabulary and in noisy environments. In spite of those technological limitations, there are lots of applications in which speech-oriented multimodal user interfaces are very helpful to users. However, in order to expand application areas for speech-oriented multimodal interfaces, we have to develop the interfaces focused on usability. In this paper, we introduce usability and user-centered design methodology in general. There has been much work for evaluating spoken dialogue systems. We give a summary for PARADISE (PARAdigm for Dialogue System Evaluation) and PROMISE (PROcedure for Multimodal Interactive System Evaluation) that are the generalized evaluation frameworks for voice and multimodal user interfaces. Then, we present usability components for speech-oriented multimodal user interfaces and usability testing guidelines that can be used in a user-centered multimodal interface design process.

  • PDF

Introduction of ETRI Broadcast News Speech Recognition System (ETRI 방송뉴스음성인식시스템 소개)

  • Park Jun
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.89-93
    • /
    • 2006
  • This paper presents ETRI broadcast news speech recognition system. There are two major issues on the broadcast news speech recognition: 1) real-time processing and 2) out-of-vocabulary handling. For real-time processing, we devised the dual decoder architecture. The input speech signal is segmented based on the long-pause between utterances, and each decoder processes the speech segment alternatively. One decoder can start to recognize the current speech segment without waiting for the other decoder to recognize the previous speech segment completely. Thus, the processing delay is not accumulated. For out-of-vocabulary handling, we updated both the vocabulary and the language model, based on the recent news articles on the internet. By updating the language model as well as the vocabulary, we can improve the performance up to 17.2% ERR.

  • PDF

I-vector similarity based speech segmentation for interested speaker to speaker diarization system (화자 구분 시스템의 관심 화자 추출을 위한 i-vector 유사도 기반의 음성 분할 기법)

  • Bae, Ara;Yoon, Ki-mu;Jung, Jaehee;Chung, Bokyung;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.39 no.5
    • /
    • pp.461-467
    • /
    • 2020
  • In noisy and multi-speaker environments, the performance of speech recognition is unavoidably lower than in a clean environment. To improve speech recognition, in this paper, the signal of the speaker of interest is extracted from the mixed speech signals with multiple speakers. The VoiceFilter model is used to effectively separate overlapped speech signals. In this work, clustering by Probabilistic Linear Discriminant Analysis (PLDA) similarity score was employed to detect the speech signal of the interested speaker, which is used as the reference speaker to VoiceFilter-based separation. Therefore, by utilizing the speaker feature extracted from the detected speech by the proposed clustering method, this paper propose a speaker diarization system using only the mixed speech without an explicit reference speaker signal. We use phone-dataset consisting of two speakers to evaluate the performance of the speaker diarization system. Source to Distortion Ratio (SDR) of the operator (Rx) speech and customer speech (Tx) are 5.22 dB and -5.22 dB respectively before separation, and the results of the proposed separation system show 11.26 dB and 8.53 dB respectively.

Implementing Onetime Password based Access Control System for Secure Sharing Service

  • Kang, Namhi
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.1-11
    • /
    • 2021
  • Development of ICT technologies leads exponential growth of various sharing economy over the last couple of years. The intuitive advantage of the sharing economy is efficient utilization of idle goods and services, but there are safety and security concerns. In this paper, we propose a onetime password based access control system to support secure accommodation sharing service and show the implementation results. To provide a secure service to both the provider and the user, the proposed system issues a onetime access password that is valid only during the sharing period reserved by the user, thereafter access returns to the accommodation owner. Especially, our system provides secure user access by merging the two elements of speaker recognition using voice and a one-time password to open and close the door lock. In this paper, we propose a secure system for accommodation sharing services as a use-case, but the proposed system can be applicable to various sharing services utilizing security-sensitive facilities.