Search | Korea Science

Automated Call Routing Call Center System Based on Speech Recognition (음성인식을 이용한 고객센터 자동 호 분류 시스템)

Shim, Yu-Jin;Kim, Jae-In;Koo, Myung-Wan
- Speech Sciences
- /
- v.12 no.2
- /
- pp.183-191
- /
- 2005
This paper describes the automated call routing for call center system based on speech recognition. We focus on the task of automatically routing telephone calls based on a users fluently spoken response instead of touch tone menus in an interactive voice response system. Vector based call routing algorithm is investigated and normalization method suggested. Call center database which was collected by KT is used for call routing experiment. Experimental results evaluating call-classification from transcribed speech are reported for that database. In case of small training data, an average call routing error reduction rate of 9% is observed when normalization method is used.
PDF

Multiple Acoustic Cues for Stop Recognition

Yun, Weon-Hee
- Proceedings of the KSPS conference
- /
- 2003.10a
- /
- pp.3-16
- /
- 2003
ㆍAcoustic characteristics of stops in speech with contextual variability ㆍPosibility of stop recognition by post processing technique ㆍFurther work - Speech database - Modification of decoder - automatic segmentation of acoustic parameters
PDF

Acoustic Model Transformation Method for Speech Recognition Employing Gaussian Mixture Model Adaptation Using Untranscribed Speech Database (미전사 음성 데이터베이스를 이용한 가우시안 혼합 모델 적응 기반의 음성 인식용 음향 모델 변환 기법)

Kim, Wooil
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.19 no.5
- /
- pp.1047-1054
- /
- 2015
This paper presents an acoustic model transform method using untranscribed speech database for improved speech recognition. In the presented model transform method, an adapted GMM is obtained by employing the conventional adaptation method, and the most similar Gaussian component is selected from the adapted GMM. The bias vector between the mean vectors of the clean GMM and the adapted GMM is used for updating the mean vector of HMM. The presented GAMT combined with MAP or MLLR brings improved speech recognition performance in car noise and speech babble conditions, compared to singly-used MAP or MLLR respectively. The experimental results show that the presented model transform method effectively utilizes untranscribed speech database for acoustic model adaptation in order to increase speech recognition accuracy.
https://doi.org/10.6109/jkiice.2015.19.5.1047 인용 PDF KSCI KPUBS HTML

Speech emotion recognition based on genetic algorithm-decision tree fusion of deep and acoustic features

Sun, Linhui;Li, Qiu;Fu, Sheng;Li, Pingan
- ETRI Journal
- /
- v.44 no.3
- /
- pp.462-475
- /
- 2022
Although researchers have proposed numerous techniques for speech emotion recognition, its performance remains unsatisfactory in many application scenarios. In this study, we propose a speech emotion recognition model based on a genetic algorithm (GA)-decision tree (DT) fusion of deep and acoustic features. To more comprehensively express speech emotional information, first, frame-level deep and acoustic features are extracted from a speech signal. Next, five kinds of statistic variables of these features are calculated to obtain utterance-level features. The Fisher feature selection criterion is employed to select high-performance features, removing redundant information. In the feature fusion stage, the GA is is used to adaptively search for the best feature fusion weight. Finally, using the fused feature, the proposed speech emotion recognition model based on a DT support vector machine model is realized. Experimental results on the Berlin speech emotion database and the Chinese emotion speech database indicate that the proposed model outperforms an average weight fusion method.
https://doi.org/10.4218/etrij.2020-0458 인용 PDF KSCI

Speech synthesis engine for car navigation systems (차량항법 시스템을 위한 소형 음성합성 엔진)

김경하;서흥석;박찬식;성태경;이상정
- 제어로봇시스템학회:학술대회논문집
- /
- 2000.10a
- /
- pp.338-338
- /
- 2000
This paper proposes a modified TD-PSOLA algorithm for Korean speech synthesis. A WSS (Weighted score search) algorithm is proposed for pitch detection and speech synthesis engine is designed using 46 phones database.
PDF

An Enhancement of Japanese Acoustic Model using Korean Speech Database (한국어 음성데이터를 이용한 일본어 음향모델 성능 개선)

Lee, Minkyu;Kim, Sanghun
- The Journal of the Acoustical Society of Korea
- /
- v.32 no.5
- /
- pp.438-445
- /
- 2013
In this paper, we propose an enhancement of Japanese acoustic model which is trained with Korean speech database by using several combination strategies. We describe the strategies for training more than two language combination, which are Cross-Language Transfer, Cross-Language Adaptation, and Data Pooling Approach. We simulated those strategies and found a proper method for our current Japanese database. Existing combination strategies are generally verified for under-resourced Language environments, but when the speech database is not fully under-resourced, those strategies have been confirmed inappropriate. We made tyied-list with only object-language on Data Pooling Approach training process. As the result, we found the ERR of the acoustic model to be 12.8 %.
https://doi.org/10.7776/ASK.2013.32.5.438 인용 PDF KSCI

A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection

Iem, Byeong-Gwan
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.16 no.4
- /
- pp.276-280
- /
- 2016
A fixed rate speech coder based on the filter bank and the non-uniform sampling technique is proposed. The non-uniform sampling is achieved by the detection of inflection points (IPs). A speech block is band passed by the filter bank, and the subband signals are processed by the IP detector, and the detected IP patterns are compared with entries of the IP database. For each subband signal, the address of the closest member of the database and the energy of the IP pattern are transmitted through channel. In the receiver, the decoder recovers the subband signals using the received addresses and the energy information, and reconstructs the speech via the filter bank summation. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is confirmed. The signal-to-noise ratio (SNR) performance of the proposed method is comparable to that of the uniform sampled pulse code modulation (PCM) below 20 kbps data rate.
https://doi.org/10.5391/IJFIS.2016.16.4.276 인용 PDF KSCI

Design and Construction of Korean-Spoken English Corpus(K-SEC) (한국인의 영어 음성 코퍼스 설계 및 구축)

Rhee Seok-Chae;Lee Sook-Hyang;Kang Seok-keun;Lee Yong-Ju
- MALSORI
- /
- no.46
- /
- pp.159-174
- /
- 2003
K-SEC (Korean-Spoken English Corpus) is a kind of speech database that is being under construction by the authors of this paper This article discusses the needs of the K-SEC from various academic disciplines and industrial circles, and it introduces the characteristics of the K-SEC design, its catalogues and contents of the recorded database, exemplifying what are being considered from both Korean and English languages' phonetics and phonologies. The K-SEC can be marked as a beginning of a parallel speech corpus, and it is suggested that a similar corpus should be enlarged for the future advancements of the experimental phonetics and the speech information technology.
PDF

A Robust Non-Speech Rejection Algorithm

Ahn, Young-Mok
- The Journal of the Acoustical Society of Korea
- /
- v.17 no.1E
- /
- pp.10-13
- /
- 1998
We propose a robust non-speech rejection algorithm using the three types of pitch-related parameters. The robust non-speech rejection algorithm utilizes three kinds of pitch parameters : (1) pitch range, (2) difference of the successive pitch range, and (3) the number of successive pitches satisfying constraints related with the previous two parameters. The acceptance rate of the speech commands was 95% for -2.8dB signal-to-noise ratio (SNR) speech database that consisted of 2440 utterances. The rejection rate of the non-speech sounds was 100% while the acceptance rate of the speech commands was 97% in an office environment.
PDF

A Study on the Endpoint Detection by FIR Filtering (FIR filtering에 의한 끝점추출에 관한 연구)

Lee, Chang-Young
- Speech Sciences
- /
- v.5 no.1
- /
- pp.81-88
- /
- 1999
This paper provides a method for speech detection. After first order FIR filtering on the speech signals, we applied the conventional method of endpoint detection which utilizes the energy as the criterion in separating signals from background noise. By FIR filtering, only the Fourier components with large values of [amplitude x frequency] become significant in energy profile. By applying this procedure to the 445-words database constructed from ETRI, we confirmed that the low-amplitude noise and/or the low-frequency noise are separated clearly from the speech signals, thereby enhancing the feasibility of ideal endpoint detections.
PDF

Search Result 331, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)