• Title/Summary/Keyword: 음성인식률

Search Result 549, Processing Time 0.032 seconds

Proposal of Hostile Command Attack Method Using Audible Frequency Band for Smart Speaker (스마트 스피커 대상 가청 주파수 대역을 활용한 적대적 명령어 공격 방법 제안)

  • Park, Tae-jun;Moon, Jongsub
    • Journal of Internet Computing and Services
    • /
    • v.23 no.4
    • /
    • pp.1-9
    • /
    • 2022
  • Recently, the functions of smart speakers have diversified, and the penetration rate of smart speakers is increasing. As it becomes more widespread, various techniques have been proposed to cause anomalous behavior against smart speakers. Dolphin Attack, which causes anomalous behavior against the Voice Controllable System (VCS) during various attacks, is a representative method. With this method, a third party controls VCS using ultrasonic band (f>20kHz) without the user's recognition. However, since the method uses the ultrasonic band, it is necessary to install an ultrasonic speaker or an ultrasonic dedicated device which is capable of outputting an ultrasonic signal. In this paper, a smart speaker is controlled by generating an audio signal modulated at a frequency (18 to 20) which is difficult for a person to hear although it is in the human audible frequency band without installing an additional device, that is, an ultrasonic device. As a result with the method proposed in this paper, while humans could not recognize voice commands even in the audible band, it was possible to control the smart speaker with a probability of 82 to 96%.

Adaptive Korean Continuous Speech Recognizer to Speech Rate (발화속도 적응적인 한국어 연속음 인식기)

  • Kim, Jae-Beom;Park, Chan-Kyu;Han, Mi-Sung;Lee, Jung-Hyun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.6
    • /
    • pp.1531-1540
    • /
    • 1997
  • In this paper, we presents automatic Korean continuous speech recognizer which is improved by the speech rate estimation and the compensation methods. Automatic continuous speech recognition is significantly more difficult than isolated word recognition because of coarticulatory effects and variations in speech rate. In order to recognize continuous speech, modeling methods of coarticulatory effects and variations in speech rate are needed. In this paper, the speech rate is measured by change of format, and the compensation is peformed by extracting relatively many feature vectors in fast speech. Coarticulatory effects are modeled by defining 514 Korean diphone set, and ETRI's 445 word DB is used for training speech material. With combining above methods, we implement automatic Korean continuous speech recognizer, which shows improved recognition rate, based on DHMM(Discrete Hidden Markov Model).

  • PDF

Decision Tree State Tying Modeling Using Parameter Estimation of Bayesian Method (Bayesian 기법의 모수 추정을 이용한 결정트리 상태 공유 모델링)

  • Oh, SangYeob
    • Journal of Digital Convergence
    • /
    • v.13 no.1
    • /
    • pp.243-248
    • /
    • 2015
  • Recognition model is not defined when you configure a model, Been added to the model after model building awareness, Model a model of the clustering due to lack of recognition models are generated by modeling is causes the degradation of the recognition rate. In order to improve decision tree state tying modeling using parameter estimation of Bayesian method. The parameter estimation method is proposed Bayesian method to navigate through the model from the results of the decision tree based on the tying state according to the maximum probability method to determine the recognition model. According to our experiments on the simulation data generated by adding noise to clean speech, the proposed clustering method error rate reduction of 1.29% compared with baseline model, which is slightly better performance than the existing approach.

A Study on Verification of Back TranScription(BTS)-based Data Construction (Back TranScription(BTS)기반 데이터 구축 검증 연구)

  • Park, Chanjun;Seo, Jaehyung;Lee, Seolhwa;Moon, Hyeonseok;Eo, Sugyeong;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.109-117
    • /
    • 2021
  • Recently, the use of speech-based interfaces is increasing as a means for human-computer interaction (HCI). Accordingly, interest in post-processors for correcting errors in speech recognition results is also increasing. However, a lot of human-labor is required for data construction. in order to manufacture a sequence to sequence (S2S) based speech recognition post-processor. To this end, to alleviate the limitations of the existing construction methodology, a new data construction method called Back TranScription (BTS) was proposed. BTS refers to a technology that combines TTS and STT technology to create a pseudo parallel corpus. This methodology eliminates the role of a phonetic transcriptor and can automatically generate vast amounts of training data, saving the cost. This paper verified through experiments that data should be constructed in consideration of text style and domain rather than constructing data without any criteria by extending the existing BTS research.

A Driving Information Centric Information Processing Technology Development Based on Image Processing (영상처리 기반의 운전자 중심 정보처리 기술 개발)

  • Yang, Seung-Hoon;Hong, Gwang-Soo;Kim, Byung-Gyu
    • Convergence Security Journal
    • /
    • v.12 no.6
    • /
    • pp.31-37
    • /
    • 2012
  • Today, the core technology of an automobile is becoming to IT-based convergence system technology. To cope with many kinds of situations and provide the convenience for drivers, various IT technologies are being integrated into automobile system. In this paper, we propose an convergence system, which is called Augmented Driving System (ADS), to provide high safety and convenience of drivers based on image information processing. From imaging sensor, the image data is acquisited and processed to give distance from the front car, lane, and traffic sign panel by the proposed methods. Also, a converged interface technology with camera for gesture recognition and microphone for speech recognition is provided. Based on this kind of system technology, car accident will be decreased although drivers could not recognize the dangerous situations, since the system can recognize situation or user context to give attention to the front view. Through the experiments, the proposed methods achieved over 90% of recognition in terms of traffic sign detection, lane detection, and distance measure from the front car.

The Study on the Speaker Adaptation Using Speaker Characteristics of Phoneme (음소에 따른 화자특성을 이용한 화자적응방법에 관한 연구)

  • 채나영;황영수
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2003.06a
    • /
    • pp.6-9
    • /
    • 2003
  • In this paper, we studied on the difference of speaker adaptation according to the phoneme classification for Korean Speech recognition. In order to study of speech adaptation according to the weight of difference of phoneme as recognition unit, we used SCHMM as recognition system. And Speaker adaptation method used in this paper was MAPE(Maximum A Posteriori Probability Estimation), Linear Spectral Estimation. In order to evaluate the performance of these methods, we used 10 Korean isolated numbers as the experimental data. It is possible for the first and the second methods to be carried out unsupervised learning and used in on-line system. And the first method was shown performance improvement over the second method, and hybrid adaptation showed the better recognition results than those which performed each method. And the result of Speaker adaptation using the variable weight according to the phoneme had better than the result using fixed weight.

  • PDF

A Study on home service robot interface in home environment (가정환경에서 홈 서비스로봇 인터페이스에 관한 연구)

  • Moon Yong-Seon;Kang Sung-ryul;Choi Hyeong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.9
    • /
    • pp.1710-1717
    • /
    • 2006
  • Present age the allotted span is increasing as becoming agining society gradually by scientific development and handicapped person who have native, acquired drag as long as is in mechanization of life culture is increasing. In this research to control robot through speech recognition controlling robot for handicapped person.

Sentence Rejection using Word Spotting Ratio in the Phoneme-based Recognition Network (음소기반 인식 네트워크에서의 단어 검출률을 이용한 문장거부)

  • Kim, Hyung-Tai;Ha, Jin-Young
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.99-102
    • /
    • 2005
  • Research efforts have been made for out-of-vocabulary word rejection to improve the confidence of speech recognition systems. However, little attention has been paid to non-recognition sentence rejection. According to the appearance of pronunciation correction systems using speech recognition technology, it is needed to reject non-recognition sentences to provide users with more accurate and robust results. In this paper, we introduce standard phoneme based sentence rejection system with no need of special filler models. Instead we used word spotting ratio to determine whether input sentences would be accepted or rejected. Experimental results show that we can achieve comparable performance using only standard phoneme based recognition network in terms of the average of FRR and FAR.

  • PDF

An Implementation of Realtime News Service Using RSS and VoiceXML (RSS와 VoiceXML을 이용한 실시간 뉴스 서비스의 구현)

  • Kwon, Hyeng-Joon;Kim, Dong-Gyu;Hong, Kwang-Seok
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2006.06a
    • /
    • pp.9-12
    • /
    • 2006
  • 높은 컴퓨터 보급률에 따른 인터넷의 대중화로 인하여 새로운 소식을 원하는 사람들은 기존의 정해진 시각에 전달되는 지면 신문보다 인터넷을 통해 새로운 소식을 접하는 경향이 높아지면서, 국내의 각 언론사들은 RSS(RDF Site Summary)문서를 제공하기 시작하였다. 차세대 웹인 시맨틱 웹의 여러 가지 규격 및 기술 중에서도 그 유용함과 편리성을 인정받아 우리 생활에 가장 먼저 적용되고 있는 RSS는 컨텐츠 배급을 위해 나온 XML형태의 규격 중 하나로서 웹사이트에서 사용자가 원하는 정보의 갱신된 내용을 신속하게 사용자에게 전달하는 자동 정보 수집 기술이다. 본 논문에서는 특정 언론사에서 제공하는 RSS문서에 음성인식 및 합성기술을 기반으로 동작하는 다른 XML형태의 규격인 음성 확장성 생성 언어(VoiceXML)를 접목하여 휴대전화 및 유선전화로 새로운 뉴스를 접할 수 있는 서비스를 제안하고 구현하였다. 실험 결과, 시간과 장소에 구애받지 않고 신뢰성 있는 언론사의 새로운 뉴스를 실시간으로 전달받을 수 있음을 확인하였다.

  • PDF

Korean Word Recognition using the Transition Matrix of VQ-Code and DHMM (VQ코드의 천이 행렬과 이산 HMM을 이용한 한국어 단어인식)

  • Chung, Kwang-Woo;Hong, Kwang-Seok;Park, Byung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.4
    • /
    • pp.40-49
    • /
    • 1994
  • In this paper, we propose methods for improving the performance of word recognition system. The ray stratey of the first method is to apply the inertia to the feature vector sequences of speech signal to stabilize the transitions between VQ cdoes. The second method is generating the new observation probabilities using the transition matrix of VQ codes as weights at the observation probability of the output symbol, so as to take into account the time relation between neighboring frames in DHMM. By applying the inertia to the feature vector sequences, we can reduce the overlapping of probability distribution of the response paths for each word and stabilize state transitions in the HMM. By using the transition matrix of VQ codes as weights in conventional DHMM. we can divide the probability distribution of feature vectors more and more, and restrict the feature distribution to a suitable region so that the performance of recognition system can improve. To evaluate the performance of the proposed methods, we carried out experiments for 50 DDD area names. As a result, the proposed methods improved the recognition rate by $4.2\%$ in the speaker-dependent test and $12.45\%$ in the speaker-independent test, respectively, compared with the conventional DHMM.

  • PDF