• Title/Summary/Keyword: 분산 음성 인식

Search Result 56, Processing Time 0.021 seconds

Development of intelligent IoT control-related AI distributed speech recognition module (지능형 IoT 관제 연계형 AI 분산음성인식 모듈개발)

  • Bae, Gi-Tae;Lee, Hee-Soo;Bae, Su-Bin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.1212-1215
    • /
    • 2017
  • 현재 출시되는 AI스피커들의 기능들을 재현하면서 문제점을 찾아서 보완하고 특히 우리나라 1인 가구의 급격한 증가로 인한 다양한 사회 문제들의 해소 방안으로 표정인식을 통해 먼저 사용자에게 다가가는 감정적인 대화가 가능한 인공지능 서비스와 인터넷 환경에 무관한 홈 IoT 제어 그리고 시각데이터 제공이 가능한 다중 AI 스피커를 제작 하였다.

Effects of Situation Awareness and Decision Making on Safety, Workload and Trust in Autonomous Vehicle Take-over Situations (자율주행 자동차의 제어권 전환상황에서 상황인식 및 의사결정 정보 제공이 운전자에게 미치는 영향)

  • Kim, Jihyun;Lee, Kahyun;Byun, Youngsi
    • Journal of the HCI Society of Korea
    • /
    • v.14 no.2
    • /
    • pp.21-29
    • /
    • 2019
  • Take-over requests in semi-autonomous cars must be handled properly in the case of road obstacles or curved roads in order to avoid accidents. In these situations, situation awareness and appropriate decision making are essential for distracted drivers. This study used a driving simulator to investigate the components of auditory-visual information systems that affect safety, workload, and trust. Auditory information consisted of either voice guidance providing situation awareness for the take-over or a beep sound that only alerted the driver. Visual information consisted of either a screen showing how to maneuver the vehicle or only an icon indicating a take-over situation. By providing auditory information that increased situation awareness and visual information that aided decision making, trust and safety increased, while workload decreased. These results suggest that the levels of situation awareness and decision making ability affect trust, safety, and workload for drivers.

Noise Reduction Algorithm using Average Estimator Least Mean Square Filter of Frame Basis (프레임 단위의 AELMS를 이용한 잡음 제거 알고리즘)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.7
    • /
    • pp.135-140
    • /
    • 2013
  • Noise estimation and detection algorithm to adapt quickly to changing noise environment using the LMS Filter. However, the LMS Filter for noise estimation for a certain period of time and need time to adapt. If the signal changes occur, have the disadvantage of being more adaptive time-consuming. Therefore, noise removal method is proposed to a frame basis AELMS Filter to compensate. In this paper, we split the input signal on a frame basis in noisy environments. Remove the LMS Filter by configuring noise predictions using the mean and variance. Noise, even if the environment changes fast adaptation time to remove the noise. Remove noise and environmental noise and speech input signal is mixed to maintain the unique characteristics of the voice is a way to reduce the damage of voice information. Noise removal method using a frame basis AELMS Filter To evaluate the performance of the noise removal. Experimental results, the attenuation obtained by removing the noise of the changing environment was improved by an average of 6.8dB.

On the speaker's position estimation using TDOA algorithm in vehicle environments (자동차 환경에서 TDOA를 이용한 화자위치추정 방법)

  • Lee, Sang-Hun;Choi, Hong-Sub
    • Journal of Digital Contents Society
    • /
    • v.17 no.2
    • /
    • pp.71-79
    • /
    • 2016
  • This study is intended to compare the performances of sound source localization methods used for stable automobile control by improving voice recognition rate in automobile environment and suggest how to improve their performances. Generally, sound source location estimation methods employ the TDOA algorithm, and there are two ways for it; one is to use a cross correlation function in the time domain, and the other is GCC-PHAT calculated in the frequency domain. Among these ways, GCC-PHAT is known to have stronger characteristics against echo and noise than the cross correlation function. This study compared the performances of the two methods above in automobile environment full of echo and vibration noise and suggested the use of a median filter additionally. We found that median filter helps both estimation methods have good performances and variance values to be decreased. According to the experimental results, there is almost no difference in the two methods' performances in the experiment using voice; however, using the signal of a song, GCC-PHAT is 10% more excellent than the cross correlation function in terms of the recognition rate. Also, when the median filter was added, the cross correlation function's recognition rate could be improved up to 11%. And in regarding to variance values, both methods showed stable performances.

A Study on the Optimization of State Tying Acoustic Models using Mixture Gaussian Clustering (혼합 가우시안 군집화를 이용한 상태공유 음향모델 최적화)

  • Ann, Tae-Ock
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.167-176
    • /
    • 2005
  • This paper describes how the state tying model based on the decision tree which is one of Acoustic models used for speech recognition optimizes the model by reducing the number of mixture Gaussians of the output probability distribution. The state tying modeling uses a finite set of questions which is possible to include the phonological knowledge and the likelihood based decision criteria. And the recognition rate can be improved by increasing the number of mixture Gaussians of the output probability distribution. In this paper, we'll reduce the number of mixture Gaussians at the highest point of recognition rate by clustering the Gaussians. Bhattacharyya and Euclidean method will be used for the distance measure needed when clustering. And after calculating the mean and variance between the pair of lowest distance, the new Gaussians are created. The parameters for the new Gaussians are derived from the parameters of the Gaussians from which it is born. Experiments have been performed using the STOCKNAME (1,680) databases. And the test results show that the proposed method using Bhattacharyya distance measure maintains their recognition rate at $97.2\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. And the method using Euclidean distance measure shows that it maintains the recognition rate at $96.9\%$ and reduces the ratio of the number of mixture Gaussians by $1.0\%$. Then the methods can optimize the state tying model.

A Comparative Performance Analysis of Spark-Based Distributed Deep-Learning Frameworks (스파크 기반 딥 러닝 분산 프레임워크 성능 비교 분석)

  • Jang, Jaehee;Park, Jaehong;Kim, Hanjoo;Yoon, Sungroh
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.5
    • /
    • pp.299-303
    • /
    • 2017
  • By piling up hidden layers in artificial neural networks, deep learning is delivering outstanding performances for high-level abstraction problems such as object/speech recognition and natural language processing. Alternatively, deep-learning users often struggle with the tremendous amounts of time and resources that are required to train deep neural networks. To alleviate this computational challenge, many approaches have been proposed in a diversity of areas. In this work, two of the existing Apache Spark-based acceleration frameworks for deep learning (SparkNet and DeepSpark) are compared and analyzed in terms of the training accuracy and the time demands. In the authors' experiments with the CIFAR-10 and CIFAR-100 benchmark datasets, SparkNet showed a more stable convergence behavior than DeepSpark; but in terms of the training accuracy, DeepSpark delivered a higher classification accuracy of approximately 15%. For some of the cases, DeepSpark also outperformed the sequential implementation running on a single machine in terms of both the accuracy and the running time.