• Title/Summary/Keyword: Speech recognition model

Search Result 623, Processing Time 0.033 seconds

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

  • Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.150-151
    • /
    • 2010
  • 본 논문에서는 잡음환경에서의 이중채널 음성인식을 위한 통계모델 기반 음성구간 검출 방법을 제안한다. 제안된 방법에서는 다채널 입력 신호로부터 얻어진 공간정보를 이용하여 음성 존재 및 부재 확률모델을 구하고 이를 통해 음성구간 검출을 행한다. 이때, 공간정보는 두 채널간의 상호 시간 차이와 상호 크기 차이로, 음성 존재 및 부재 확률은 가우시안 커널 밀도 기반의 확률모델로 표현된다. 그리고 음성구간은 각 시간 프레임 별 음성 존재 확률 대비 음성 부재 확률의 비를 추정하여 검출된다. 제안된 음성구간 검출 방법의 평가를 위해 검출된 구간만을 입력으로 하는 음성인식 성능을 측정한다. 실험결과, 제안된 공간정보를 이용하는 통계모델 기반의 음성구간 검출 방법이 주파수 에너지를 이용하는 통계모델 기반의 음성구간 검출 방법과 주파수 스펙트럼 밀도 기반 음성구간 검출 방법에 비해 각각 15.6%, 15.4%의 상대적 오인식률 개선을 보였다.

  • PDF

The Comparison of OC1 and CART for Prosodic Boundary Index Prediction (운율 경계강도 예측을 위한 OC1의 적용 및 CART와의 비교)

  • 임동식;김진영;김선미
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4
    • /
    • pp.60-64
    • /
    • 1999
  • In this paper, we apply CART(Classification And Regression tree) and OC1(Oblique Classifier1) which methods are widely used for continuous speech recognition and synthesis. We prediet prosodic boundary index by applying CART and OC1, which combine right depth of tree-structured method and To_Right of link grammar method with tri_gram model. We assigned four prosodic boundary index level from 0 to 3. Experimental results show that OC1 method is superior to CART method. In other words, in spite of OC1's having fewer nodes than CART, it can make more improved prediction than CART.

  • PDF

A clustering algorithm of statistical langauge model and its application on speech recognition (통계적 언어 모델의 clustering 알고리즘과 음성인식에의 적용)

  • Kim, Woo-Sung;Koo, Myoung-Wan
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.145-152
    • /
    • 1996
  • 연속음성인식 시스템을 개발하기 위해서는 언어가 갖는 문법적 제약을 이용한 언어모델이 요구된다. 문법적 규칙을 이용한 언어모델은 전문가가 일일이 문법 규칙을 만들어 주어야 하는 단점이 있다. 통계적 언어 모델에서는 문법적인 정보를 수작업으로 만들어 주지 않는 대신 그러한 모든 정보를 학습을 통해서 훈련해야 하기 때문에 이를 위해 요구되는 학습 데이터도 엄청나게 증가한다. 따라서 적은 양의 데이터로도 이와 유사한 효과를 보일 수 있는 것이 클래스에 의거한 언어 모델이다. 또 이 모델은 음성 인식과 연계시에 탐색 공간을 줄여 주기 때문에 실시간 시스템 구현에 매우 유용한 모델이다. 여기서는 자동으로 클래스를 찾아주는 알고리즘을 호텔예약시스템의 corpus에 적용, 분석해 보았다. Corpus 자체가 문법규칙이 뚜렷한 특성을 갖고 있기 때문에 heuristic하게 클래스를 준 것과 유사한 결과를 보였지만 corpus 크기가 커질 경우에는 매우 유용할 것이며, initial map을 heuristic하게 주고 그 알고리즘을 적용한 결과 약간의 성능향상을 볼 수 있었다. 끝으로 음성인식시스템과 접합해 본 결과 유사한 결과를 얻었으며 언어모델에도 음향학적 특성을 반영할 수 있는 연구가 요구됨을 알 수 있었다.

  • PDF

A study on complexity of deep learning model (딥러닝 모형의 복잡도에 관한 연구)

  • Kim, Dongha;Baek, Gyuseung;Kim, Yongdai
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1217-1227
    • /
    • 2017
  • Deep learning has been studied explosively and has achieved excellent performance in areas like image and speech recognition, the application areas in which computations have been challenges with ordinary machine learning techniques. The theoretical study of deep learning has also been researched toward improving the performance. In this paper, we try to find a key of the success of the deep learning in rich and efficient expressiveness of the deep learning function, and analyze the theoretical studies related to it.

Emotional Intelligence System for Ubiquitous Smart Foreign Language Education Based on Neural Mechanism

  • Dai, Weihui;Huang, Shuang;Zhou, Xuan;Yu, Xueer;Ivanovi, Mirjana;Xu, Dongrong
    • Journal of Information Technology Applications and Management
    • /
    • v.21 no.3
    • /
    • pp.65-77
    • /
    • 2014
  • Ubiquitous learning has aroused great interest and is becoming a new way for foreign language education in today's society. However, how to increase the learners' initiative and their community cohesion is still an issue that deserves more profound research and studies. Emotional intelligence can help to detect the learner's emotional reactions online, and therefore stimulate his interest and the willingness to participate by adjusting teaching skills and creating fun experiences in learning. This is, actually the new concept of smart education. Based on the previous research, this paper concluded a neural mechanism model for analyzing the learners' emotional characteristics in ubiquitous environment, and discussed the intelligent monitoring and automatic recognition of emotions from the learners' speech signals as well as their behavior data by multi-agent system. Finally, a framework of emotional intelligence system was proposed concerning the smart foreign language education in ubiquitous learning.

Machine Learning Based Domain Classification for Korean Dialog System (기계학습을 이용한 한국어 대화시스템 도메인 분류)

  • Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.8
    • /
    • pp.1-8
    • /
    • 2019
  • Dialog system is becoming a new dominant interaction way between human and computer. It allows people to be provided with various services through natural language. The dialog system has a common structure of a pipeline consisting of several modules (e.g., speech recognition, natural language understanding, and dialog management). In this paper, we tackle a task of domain classification for the natural language understanding module by employing machine learning models such as convolutional neural network and random forest. For our dataset of seven service domains, we showed that the random forest model achieved the best performance (F1 score 0.97). As a future work, we will keep finding a better approach for domain classification by investigating other machine learning models.

Structural live load surveys by deep learning

  • Li, Yang;Chen, Jun
    • Smart Structures and Systems
    • /
    • v.30 no.2
    • /
    • pp.145-157
    • /
    • 2022
  • The design of safe and economical structures depends on the reliable live load from load survey. Live load surveys are traditionally conducted by randomly selecting rooms and weighing each item on-site, a method that has problems of low efficiency, high cost, and long cycle time. This paper proposes a deep learning-based method combined with Internet big data to perform live load surveys. The proposed survey method utilizes multi-source heterogeneous data, such as images, voice, and product identification, to obtain the live load without weighing each item through object detection, web crawler, and speech recognition. The indoor objects and face detection models are first developed based on fine-tuning the YOLOv3 algorithm to detect target objects and obtain the number of people in a room, respectively. Each detection model is evaluated using the independent testing set. Then web crawler frameworks with keyword and image retrieval are established to extract the weight information of detected objects from Internet big data. The live load in a room is derived by combining the weight and number of items and people. To verify the feasibility of the proposed survey method, a live load survey is carried out for a meeting room. The results show that, compared with the traditional method of sampling and weighing, the proposed method could perform efficient and convenient live load surveys and represents a new load research paradigm.

ICLAL: In-Context Learning-Based Audio-Language Multi-Modal Deep Learning Models (ICLAL: 인 컨텍스트 러닝 기반 오디오-언어 멀티 모달 딥러닝 모델)

  • Jun Yeong Park;Jinyoung Yeo;Go-Eun Lee;Chang Hwan Choi;Sang-Il Choi
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.514-517
    • /
    • 2023
  • 본 연구는 인 컨택스트 러닝 (In-Context Learning)을 오디오-언어 작업에 적용하기 위한 멀티모달 (Multi-Modal) 딥러닝 모델을 다룬다. 해당 모델을 통해 학습 단계에서 오디오와 텍스트의 소통 가능한 형태의 표현 (Representation)을 학습하고 여러가지 오디오-텍스트 작업을 수행할 수 있는 멀티모달 딥러닝 모델을 개발하는 것이 본 연구의 목적이다. 모델은 오디오 인코더와 언어 인코더가 연결된 구조를 가지고 있으며, 언어 모델은 6.7B, 30B 의 파라미터 수를 가진 자동회귀 (Autoregressive) 대형 언어 모델 (Large Language Model)을 사용한다 오디오 인코더는 자기지도학습 (Self-Supervised Learning)을 기반으로 사전학습 된 오디오 특징 추출 모델이다. 언어모델이 상대적으로 대용량이기 언어모델의 파라미터를 고정하고 오디오 인코더의 파라미터만 업데이트하는 프로즌 (Frozen) 방법으로 학습한다. 학습을 위한 과제는 음성인식 (Automatic Speech Recognition)과 요약 (Abstractive Summarization) 이다. 학습을 마친 후 질의응답 (Question Answering) 작업으로 테스트를 진행했다. 그 결과, 정답 문장을 생성하기 위해서는 추가적인 학습이 필요한 것으로 보였으나, 음성인식으로 사전학습 한 모델의 경우 정답과 유사한 키워드를 사용하는 문법적으로 올바른 문장을 생성함을 확인했다.

Deep Neural Network Model For Short-term Electric Peak Load Forecasting (단기 전력 부하 첨두치 예측을 위한 심층 신경회로망 모델)

  • Hwang, Heesoo
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.5
    • /
    • pp.1-6
    • /
    • 2018
  • In smart grid an accurate load forecasting is crucial in planning resources, which aids in improving its operation efficiency and reducing the dynamic uncertainties of energy systems. Research in this area has included the use of shallow neural networks and other machine learning techniques to solve this problem. Recent researches in the field of computer vision and speech recognition, have shown great promise for Deep Neural Networks (DNN). To improve the performance of daily electric peak load forecasting the paper presents a new deep neural network model which has the architecture of two multi-layer neural networks being serially connected. The proposed network model is progressively pre-learned layer by layer ahead of learning the whole network. For both one day and two day ahead peak load forecasting the proposed models are trained and tested using four years of hourly load data obtained from the Korea Power Exchange (KPX).

Development for Estimation Model of Runway Visual Range using Deep Neural Network (심층신경망을 활용한 활주로 가시거리 예측 모델 개발)

  • Ku, SungKwan;Hong, SeokMin
    • Journal of Advanced Navigation Technology
    • /
    • v.21 no.5
    • /
    • pp.435-442
    • /
    • 2017
  • The runway visual range affected by fog and so on is one of the important indicators to determine whether aircraft can take off and land at the airport or not. In the case of airports where transportation airplanes are operated, major weather forecasts including the runway visual range for local area have been released and provided to aviation workers for recognizing that. This paper proposes a runway visual range estimation model with a deep neural network applied recently to various fields such as image processing, speech recognition, natural language processing, etc. It is developed and implemented for estimating a runway visual range of local airport with a deep neural network. It utilizes the past actual weather observation data of the applied airfield for constituting the learning of the neural network. It can show comparatively the accurate estimation result when it compares the results with the existing observation data. The proposed model can be used to generate weather information on the airfield for which no other forecasting function is available.