• Title/Summary/Keyword: Voice recognition system

Search Result 332, Processing Time 0.027 seconds

Improvement of User Recognition Rate using Multi-modal Biometrics (다중생체인식 기법을 이용한사용자 인식률 향상)

  • Geum, Myung-Hwan;Lee, Kyu-Won;Lee, Bong-Hwan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.12 no.8
    • /
    • pp.1456-1462
    • /
    • 2008
  • In general, it is known a single biometric-based personal authentication has limitation to improve recognition rate due to weakness of individual recognition scheme. The recognition rate of face recognition system can be reduced by environmental factor such as illumination, while speaker verification system does not perform well with added surrounding noise. In this paper, a multi-modal biometric system composed of face and voice recognition system is proposed in order to improve the performance of the individual authentication system. The proposed empirical weight sum rule based on the reliability of the individual authentication system is applied to improve the performance of multi-modal biometrics. Since the proposed system is implemented using JAVA applet with security function, it can be utilized in the field of user authentication on the generic Web.

Hand Gesture Recognition using Multivariate Fuzzy Decision Tree and User Adaptation (다변량 퍼지 의사결정트리와 사용자 적응을 이용한 손동작 인식)

  • Jeon, Moon-Jin;Do, Jun-Hyeong;Lee, Sang-Wan;Park, Kwang-Hyun;Bien, Zeung-Nam
    • The Journal of Korea Robotics Society
    • /
    • v.3 no.2
    • /
    • pp.81-90
    • /
    • 2008
  • While increasing demand of the service for the disabled and the elderly people, assistive technologies have been developed rapidly. The natural signal of human such as voice or gesture has been applied to the system for assisting the disabled and the elderly people. As an example of such kind of human robot interface, the Soft Remote Control System has been developed by HWRS-ERC in $KAIST^[1]$. This system is a vision-based hand gesture recognition system for controlling home appliances such as television, lamp and curtain. One of the most important technologies of the system is the hand gesture recognition algorithm. The frequently occurred problems which lower the recognition rate of hand gesture are inter-person variation and intra-person variation. Intra-person variation can be handled by inducing fuzzy concept. In this paper, we propose multivariate fuzzy decision tree(MFDT) learning and classification algorithm for hand motion recognition. To recognize hand gesture of a new user, the most proper recognition model among several well trained models is selected using model selection algorithm and incrementally adapted to the user's hand gesture. For the general performance of MFDT as a classifier, we show classification rate using the benchmark data of the UCI repository. For the performance of hand gesture recognition, we tested using hand gesture data which is collected from 10 people for 15 days. The experimental results show that the classification and user adaptation performance of proposed algorithm is better than general fuzzy decision tree.

  • PDF

A Computer Access System for the Physically Disabled Using Eye-Tracking and Speech Recognition (아이트래킹 및 음성인식 기술을 활용한 지체장애인 컴퓨터 접근 시스템)

  • Kwak, Seongeun;Kim, Isaac;Sim, Debora;Lee, Seung Hwan;Hwang, Sung Soo
    • Journal of the HCI Society of Korea
    • /
    • v.12 no.4
    • /
    • pp.5-15
    • /
    • 2017
  • Alternative computer access devices are one of the ways for the physically disabled to meet their desire to participate in social activities. Most of these devices provide access to computers by using their feet or heads. However, it is not easy to control the mouse by using their feet, head, etc. with physical disabilities. In this paper, we propose a computer access system for the physically disabled. The proposed system can move the mouse only by the user's gaze using the eye-tracking technology. The mouse can be clicked through the external button which is relatively easy to press, and the character can be inputted easily and quickly through the voice recognition. It also provides detailed functions such as mouse right-click, double-click, drag function, on-screen keyboard function, internet function, scroll function, etc.

Implementation of Vocabulary- Independent Speech Recognizer Using a DSP (DSP를 이용한 가변어휘 음성인식기 구현에 관한 연구)

  • Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.11 no.3
    • /
    • pp.143-156
    • /
    • 2004
  • In this paper, we implemented a vocabulary-independent speech recognizer using the TMS320VC33 DSP. For this implementation, we had developed very small-sized recognition engine based on diphone sub-word unit, which is especially suited for embedded applications where the system resources are severely limited. The recognition accuracy of the developed recognizer with 1 mixture per state and 4 states per diphone is 94.5% when tested on frequently-used 2000 words set. The design of the hardware was focused on minimal use of parts, which results in reduced material cost. The finally developed hardware only includes a DSP, 512 Kword flash ROM and a voice codec. In porting the recognition engine to the DSP, we introduced several methods of using data and program memory efficiently and developed the versatile software protocol for host interface. Finally, we also made an evaluation board for testing the developed hardware recognition module.

  • PDF

A study on the implementation of user identification system using bioinfomatics (생물학적 특징을 이용한 사용자 인증시스템 구현)

  • 문용선;정택준
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.6 no.2
    • /
    • pp.346-355
    • /
    • 2002
  • This study will offer multimodal recognition instead of an existing monomodal bioinfomatics by using face, lips, to improve the accuracy of recognition. Each bioinfomatics vector can be found by the following ways. For a face, the feature is calculated by principal component analysis with wavelet multiresolution. For a lip, a filter is used to find out an equation to calculate the edges of the lips first. Then by using a thinning image and least square method, an equation factor can be drawn. A voice recognition is found with MFCC by using mel frequency. We've sorted backpropagation neural network and experimented with the inputs used above. Based on the experimental results we discuss the advantage and efficiency.

Speech Recognition Model Based on CNN using Spectrogram (스펙트로그램을 이용한 CNN 음성인식 모델)

  • Won-Seog Jeong;Haeng-Woo Lee
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.685-692
    • /
    • 2024
  • In this paper, we propose a new CNN model to improve the recognition performance of command voice signals. This method obtains a spectrogram image after performing a short-time Fourier transform (STFT) of the input signal and improves command recognition performance through supervised learning using a CNN model. After Fourier transforming the input signal for each short-time section, a spectrogram image is obtained and multi-classification learning is performed using a CNN deep learning model. This effectively classifies commands by converting the time domain voice signal to the frequency domain to express the characteristics well and performing deep learning training using the spectrogram image for the conversion parameters. To verify the performance of the speech recognition system proposed in this study, a simulation program using Tensorflow and Keras libraries was created and a simulation experiment was performed. As a result of the experiment, it was confirmed that an accuracy of 92.5% could be obtained using the proposed deep learning algorithm.

Untact-based elevator operating system design using deep learning of private buildings (프라이빗 건물의 딥러닝을 활용한 언택트 기반 엘리베이터 운영시스템 설계)

  • Lee, Min-hye;Kang, Sun-kyoung;Shin, Seong-yoon;Mun, Hyung-jin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.161-163
    • /
    • 2021
  • In an apartment or private building, it is difficult for the user to operate the elevator button in a similar situation with luggage in both hands. In an environment where human contact must be minimized due to a highly infectious virus such as COVID-19, it is inevitable to operate an elevator based on untact. This paper proposes an operating system capable of operating the elevator by using the user's voice and image processing through the user's face without pressing the elevator button. The elevator can be operated to a designated floor without pressing a button by detecting the face of a person entering the elevator by detecting the person's face from the camera installed in the elevator, matching the information registered in advance. When it is difficult to recognize a person's face, it is intended to enhance the convenience of elevator use in an untouched environment by controlling the floor of the elevator using the user's voice through a microphone and automatically recording access information.

  • PDF

Development of a Voice-activated Map Information Retrieval System based on MFC (MFC 기반 음성구동 수치지도정보 검색시스템의 구현)

  • Kim, Nag-Cheol;Kim, Tae-Soo;Jo, Myung-Hee;Chung, Hyun-Yeol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.3 no.1
    • /
    • pp.69-77
    • /
    • 2000
  • In retrieving and analyzing digital map information using mouse or key strokes, it needs several times of repeated mouse operation for designating the range of study area. In this study, we proposed a voice activated map information retrieval system for eliminating such repetitions and we realized the system on the personal computer. The system was constructed in two ways - traditional OLE(object linking embedding) method and MFC(Microsoft fundamental class) method in controlling of window display for practical use. In the system performance evaluation, the retrieval data for digital map were consisted of 68 words uttered by 3 male persons which include attribute words and control words for Susung-gu area of Taegu city in a 1:5,000 map. As the results, we obtained the average 98.02% of recognition rate through on-line tests in the office environment and the operating speed of 5.39 seconds by OLE, 10.38 seconds by MFC. These results showed the possibility for practical use of information retrieval system using speech recognition in digital map.

  • PDF

Study of the Wheelchair controlled by Joystick and Voices (조이스틱제어 및 음성으로 제어되는 휠체어의 연구)

  • Min, Hea-Jung;Yoon, Hung-Ri
    • Proceedings of the KIEE Conference
    • /
    • 1988.07a
    • /
    • pp.723-726
    • /
    • 1988
  • This paper is a study about the automatic control of wheelchairs. This is realized by joystick, and is simulated by voice signal recognition. The control system by joystick is designed as follows: joystick paddle is connected with a timer and this timer ouput is high only when the joystick is moved. A computer reads the duration of this high state, and ouputs motor control word decided from this value using look-up table. The control system by voice signal is designed as follows: partial autocorrelation coefficients are computed from A/D converted signals and these values are compared with referance patterns. From this, the motor control word is decided on by the neareast neighbor rule.

  • PDF

Design and Implementation of Mobile Communication System for Hearing- impaired Person (청각 장애인을 위한 모바일 통화 시스템 설계 및 구현)

  • Yun, Dong-Hee;Kim, Young-Ung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.5
    • /
    • pp.111-116
    • /
    • 2016
  • According to the Ministry of Science, ICT and Future Planning's survey of information gap, smartphone retention rate of disabled people stayed in one-third of non-disabled people, the situation is significantly less access to information for people with disabilities than non-disabled people. In this paper, we develop an application, CallHelper, that helps to be more convenient to use mobile voice calls to the auditory disabled people. CallHelper runs automatically when a call comes in, translates caller's voice to text output on the mobile screen, and displays the emotion reasoning from the caller's voice to visualize emoticons. It also saves voice, translated text, and emotion data that can be played back.