통합 검색 | Korea Science

LPC 켑스트럼 계수와 신경회로망을 사용한 화자인식 (Speaker Recognition using LPC cepstrum Coefficients and Neural Network)

최재승
- 한국정보통신학회논문지
- /
- 제15권12호
- /
- pp.2521-2526
- /
- 2011
본 논문에서는 퍼셉트론 신경회로망과 선형예측부호화 켑스트럼 계수를 사용한 화자인식 알고리즘을 제안한다. 제안하는 화자인식 알고리즘은 입력받은 음성신호에 대해서 유성음 구간을 추출한다. 추출된 유성음 구간에 대하여 선형예측 분석에 의하여 화자의 특성을 가지고 있는 선형예측부호화 켑스트럼 계수를 구한다. 구해진 선형예측부호화 켑스트럼 계수를 분류하기 위하여 이 켑스트럼 계수를 퍼셉트론 신경회로망의 입력으로 사용하여 네트워크의 학습을 수행한다. 본 실험에서는 선형예측부호화 켑스트럼 계수와 신경회로망을 사용하여 본 화자인식 알고리즘이 유효하다는 것을 인식률을 통하여 확인한다.
https://doi.org/10.6109/jkiice.2011.15.12.2521 인용 PDF KSCI

An SNR Scalable Video Coding using Linearly Combined Motion Vectors

Ryu, Chang-Hoon;Byoungjun Han;Park, Kwang-Pyo;Yoon, Eung-Sik;Lee, Keun-Young
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2002년도 ITC-CSCC -1
- /
- pp.50-53
- /
- 2002
There are increasing needs to deliver the multimedia streaming over heterogeneous networks. When considering network environments and equipment accessed by user, delivery of video streaming must be scalable. There are many kinds of scalable video coding: spatial, temporal, SNR, and hybrid. The SNR scalable and spatial resolution, but different SNR quality with respect to layers. The 1-layer SNR scalable encoder produces SNR scalable video streams with ease. But, there is drift problem. Modified 1-layer approach does not have this problem but coding inefficiency, and is not MPEG-compliant. The present MPEG-compliant 2-layer encoder comes out to reduce coding rate. But it still use only base layer to encode whole layer. In this paper, we propose adaptive MPEG-compliant 2-layer encoder. Using linear combination algorithm, encoder use 1 motion vector to encode the sequences efficiently. By dong this, we can achieve the coding efficiency of SNR scalable coding.
PDF

Network Coding-Based Fault Diagnosis Protocol for Dynamic Networks

Jarrah, Hazim;Chong, Peter Han Joo;Sarkar, Nurul I.;Gutierrez, Jairo
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제14권4호
- /
- pp.1479-1501
- /
- 2020
Dependable functioning of dynamic networks is essential for delivering ubiquitous services. Faults are the root causes of network outages. The comparison diagnosis model, which automates fault's identification, is one of the leading approaches to attain network dependability. Most of the existing research has focused on stationary networks. Nonetheless, the time-free comparison model imposes no time constraints on the system under considerations, and it suits most of the diagnosis requirements of dynamic networks. This paper presents a novel protocol that diagnoses faulty nodes in diagnosable dynamic networks. The proposed protocol comprises two stages, a testing stage, which uses the time-free comparison model to diagnose faulty neighbour nodes, and a disseminating stage, which leverages a Random Linear Network Coding (RLNC) technique to disseminate the partial view of nodes. We analysed and evaluated the performance of the proposed protocol under various scenarios, considering two metrics: communication overhead and diagnosis time. The simulation results revealed that the proposed protocol diagnoses different types of faults in dynamic networks. Compared with most related protocols, our proposed protocol has very low communication overhead and diagnosis time. These results demonstrated that the proposed protocol is energy-efficient, scalable, and robust.
https://doi.org/10.3837/tiis.2020.04.005 인용 PDF KSCI HTML

심층 신뢰 신경망을 이용한 오푸스 코덱 기반 인공 음성 대역 확장 기술 (Artificial speech bandwidth extension technique based on opus codec using deep belief network)

최윤상;이아성;강상원
- 한국음향학회지
- /
- 제36권1호
- /
- pp.70-77
- /
- 2017
대역폭 확장 기술은 300 ~ 3,400 Hz 대역의 협대역 음성 신호를 50 ~ 7,000 Hz 대역의 광대역 음성신호로 확장하여 음질, 명료도, 그리고 자연성을 높이는 기술이다. 본 논문에서는 협대역 음성 정보를 이용하여 광대역 음성신호를 추정하는 인공 대역폭 확장 기술을 설계하여, 오푸스(Opus) 오디오 복호화기에 내장시킴으로써, 대역폭 확장 모듈에서의 LPC(Linear Prediction Coding) 분석 및 LSF(Line Spectral Frequencies) 해석과 관련된 계산량을 감소시켰고 알고리즘 지연도 줄였다. 이를 위해 현재 다양한 분야에 적용되고 있는 딥 러닝 기술 중 하나인 심층 신뢰 신경망(Deep Belief Network, DBN) 방식을 스펙트럼 포락선 확장에 도입하여 전통적인 코드북 매핑법보다 더 좋은 품질의 스펙트럼을 만들 수 있었다.
https://doi.org/10.7776/ASK.2017.36.1.070 인용 PDF KSCI

Hidden LMS 적응 필터링 알고리즘을 이용한 경쟁학습 화자검증 (Speaker Verification Using Hidden LMS Adaptive Filtering Algorithm and Competitive Learning Neural Network)

조성원;김재민
- 대한전기학회논문지:시스템및제어부문D
- /
- 제51권2호
- /
- pp.69-77
- /
- 2002
Speaker verification can be classified in two categories, text-dependent speaker verification and text-independent speaker verification. In this paper, we discuss text-dependent speaker verification. Text-dependent speaker verification system determines whether the sound characteristics of the speaker are equal to those of the specific person or not. In this paper we obtain the speaker data using a sound card in various noisy conditions, apply a new Hidden LMS (Least Mean Square) adaptive algorithm to it, and extract LPC (Linear Predictive Coding)-cepstrum coefficients as feature vectors. Finally, we use a competitive learning neural network for speaker verification. The proposed hidden LMS adaptive filter using a neural network reduces noise and enhances features in various noisy conditions. We construct a separate neural network for each speaker, which makes it unnecessary to train the whole network for a new added speaker and makes the system expansion easy. We experimentally prove that the proposed method improves the speaker verification performance.
PDF KSCI

소프트컴퓨팅 기법을 이용한 다음절 단어의 음성인식 (Speech Recognition of Multi-Syllable Words Using Soft Computing Techniques)

이종수;윤지원
- 정보저장시스템학회논문집
- /
- 제6권1호
- /
- pp.18-24
- /
- 2010
The performance of the speech recognition mainly depends on uncertain factors such as speaker's conditions and environmental effects. The present study deals with the speech recognition of a number of multi-syllable isolated Korean words using soft computing techniques such as back-propagation neural network, fuzzy inference system, and fuzzy neural network. Feature patterns for the speech recognition are analyzed with 12th order thirty frames that are normalized by the linear predictive coding and Cepstrums. Using four models of speech recognizer, actual experiments for both single-speakers and multiple-speakers are conducted. Through this study, the recognizers of combined fuzzy logic and back-propagation neural network and fuzzy neural network show the better performance in identifying the speech recognition.
PDF KSCI

AR계수와 SVM을 이용한 뇌파 기반 운전자의 졸음 감지 시스템 (Electroencephalogram-based Driver Drowsiness Detection System Using AR Coefficients and SVM)

한형섭;정의필
- 한국지능시스템학회논문지
- /
- 제22권6호
- /
- pp.768-773
- /
- 2012
운전 중 운전자의 졸음은 교통 사망사고를 일으키는 중요한 요인이며 음주운전보다도 더 위험할 수 도 있다. 이러한 이유로 운전자의 졸음을 판별하고 경고하는 시스템 개발이 최근에 매우 중요한 이슈로 떠올랐다. 그중에서도 졸음과 가장 밀접한 관련이 있는 생체 신호 분석이 많이 적용되는데 그중에서도 뇌파(Electroencephalogram, EEG)와 안구전도(Electrooculogram, EOG)를 분석하는 연구가 주류를 이루고 있다. 본 논문에서는 실험 프로토콜를 바탕으로 측정된 뇌파를 주파수별로 분석하여 운전자의 상태별 뇌파 데이터베이스를 구축하였고 선형예측(Linear Predictive Coding, LPC) 계수와 Support Vector Machine(SVM)을 이용한 운전자 졸음 감지 시스템을 제안한다. 실험결과로 졸음의 뇌파분석에서 알파파가 감소하며 세타파가 증가하는 추세를 보였으며, LPC 계수가 각성, 졸음 및 수면상태의 특징을 잘 반영하였다. 특히 제안한 시스템은 적은 샘플(250ms)에서도 96.5%의 높은 분류 결과를 얻어 짧은 순간에 일어날 운전시 돌발 상황을 실시간으로 예측할 수 있는 가능성을 보였다.
https://doi.org/10.5391/JKIIS.2012.22.6.768 인용 PDF KSCI

Action Recognition with deep network features and dimension reduction

Li, Lijun;Dai, Shuling
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제13권2호
- /
- pp.832-854
- /
- 2019
Action recognition has been studied in computer vision field for years. We present an effective approach to recognize actions using a dimension reduction method, which is applied as a crucial step to reduce the dimensionality of feature descriptors after extracting features. We propose to use sparse matrix and randomized kd-tree to modify it and then propose modified Local Fisher Discriminant Analysis (mLFDA) method which greatly reduces the required memory and accelerate the standard Local Fisher Discriminant Analysis. For feature encoding, we propose a useful encoding method called mix encoding which combines Fisher vector encoding and locality-constrained linear coding to get the final video representations. In order to add more meaningful features to the process of action recognition, the convolutional neural network is utilized and combined with mix encoding to produce the deep network feature. Experimental results show that our algorithm is a competitive method on KTH dataset, HMDB51 dataset and UCF101 dataset when combining all these methods.
https://doi.org/10.3837/tiis.2019.02.019 인용 PDF KSCI HTML

CDMA 네트워크에서의 ECG 압축 알고리즘의 성능 평가 (Performance Evaluation of Wavelet-based ECG Compression Algorithms over CDMA Networks)

김병수;유선국
- 대한전기학회논문지:시스템및제어부문D
- /
- 제53권9호
- /
- pp.663-669
- /
- 2004
The mobile tole-cardiology system is the new research area that support an ubiquitous health care based on mobile telecommunication networks. Although there are many researches presenting the modeling concepts of a GSM-based mobile telemedical system, practical application needs to be considered both compression performance and error corruption in the mobile environment. This paper evaluates three wavelet ECG compression algorithms over CDMA networks. The three selected methods are Rajoub using EPE thresholding, Embedded Zerotree Wavelet(EZW) and Wavelet transform Higher Order Statistics Coding(WHOSC) with linear prediction. All methodologies protected more significant information using Forward Error Correction coding and measured not only compression performance in noise-free but also error robustness and delay profile in CDMA environment. In addition, from the field test we analyzed the PRD for movement speed and the features of CDMA 1X. The test results show that Rajoub has low robustness over high error attack and EZW contributes to more efficient exploitation in variable bandwidth and high error. WHOSC has high robustness in overall BER but loses performance about particular abnormal ECG.
PDF KSCI

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계 (A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments)

이길호;윤재삼;오유리;김홍국
- 대한음성학회지:말소리
- /
- 제54호
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

검색결과 55건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)