• 제목/요약/키워드: Speech speed

검색결과 241건 처리시간 0.026초

저작권 보호를 위한 HMM기반의 음악 식별 시스템 (HMM-based Music Identification System for Copyright Protection)

  • 김희동;김도현;김지환
    • 말소리와 음성과학
    • /
    • 제1권1호
    • /
    • pp.63-67
    • /
    • 2009
  • In this paper, in order to protect music copyrights, we propose a music identification system which is scalable to the number of pieces of registered music and robust to signal-level variations of registered music. For its implementation, we define the new concepts of 'music word' and 'music phoneme' as recognition units to construct 'music acoustic models'. Then, with these concepts, we apply the HMM-based framework used in continuous speech recognition to identify the music. Each music file is transformed to a sequence of 39-dimensional vectors. This sequence of vectors is represented as ordered states with Gaussian mixtures. These ordered states are trained using Baum-Welch re-estimation method. Music files with a suspicious copyright are also transformed to a sequence of vectors. Then, the most probable music file is identified using Viterbi algorithm through the music identification network. We implemented a music identification system for 1,000 MP3 music files and tested this system with variations in terms of MP3 bit rate and music speed rate. Our proposed music identification system demonstrates robust performance to signal variations. In addition, scalability of this system is independent of the number of registered music files, since our system is based on HMM method.

  • PDF

Folded Architecture for Digital Gammatone Filter Used in Speech Processor of Cochlear Implant

  • Karuppuswamy, Rajalakshmi;Arumugam, Kandaswamy;Swathi, Priya M.
    • ETRI Journal
    • /
    • 제35권4호
    • /
    • pp.697-705
    • /
    • 2013
  • Emerging trends in the area of digital very large scale integration (VLSI) signal processing can lead to a reduction in the cost of the cochlear implant. Digital signal processing algorithms are repetitively used in speech processors for filtering and encoding operations. The critical paths in these algorithms limit the performance of the speech processors. These algorithms must be transformed to accommodate processors designed to be high speed and have less area and low power. This can be realized by basing the design of the auditory filter banks for the processors on digital VLSI signal processing concepts. By applying a folding algorithm to the second-order digital gammatone filter (GTF), the number of multipliers is reduced from five to one and the number of adders is reduced from three to one, without changing the characteristics of the filter. Folded second-order filter sections are cascaded with three similar structures to realize the eighth-order digital GTF whose response is a close match to the human cochlea response. The silicon area is reduced from twenty to four multipliers and from twelve to four adders by using the folding architecture.

세밀한 감정 음성 합성 시스템의 속도와 합성음의 음질 개선 연구 (A study on the improvement of generation speed and speech quality for a granularized emotional speech synthesis system)

  • 엄세연;오상신;장인선;안충현;강홍구
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송∙미디어공학회 2020년도 하계학술대회
    • /
    • pp.453-455
    • /
    • 2020
  • 본 논문은 시각 장애인을 위한 감정 음성 자막 서비스를 생성하는 종단 간(end-to-end) 감정 음성 합성 시스템(emotional text-to-speech synthesis system, TTS)의 음성 합성 속도를 높이면서도 합성음의 음질을 향상시키는 방법을 제안한다. 기존에 사용했던 전역 스타일 토큰(Global Style Token, GST)을 이용한 감정 음성 합성 방법은 다양한 감정을 표현할 수 있는 장점을 갖고 있으나, 합성음을 생성하는데 필요한 시간이 길고 학습할 데이터의 동적 영역을 효과적으로 처리하지 않으면 합성음에 클리핑(clipping) 현상이 발생하는 등 음질이 저하되는 양상을 보였다. 이를 보안하기 위해 본 논문에서는 새로운 데이터 전처리 과정을 도입하였고 기존의 보코더(vocoder)인 웨이브넷(WaveNet)을 웨이브알엔엔(WaveRNN)으로 대체하여 생성 속도와 음질 측면에서 개선됨을 보였다.

  • PDF

분절특징 HMM의 특성에 관한 연구 (A Study on the Characteristics of Segmental-Feature HMM)

  • 윤영선;정호영
    • 대한음성학회지:말소리
    • /
    • 제43호
    • /
    • pp.163-178
    • /
    • 2002
  • In this paper, we discuss the characteristics of Segmental-Feature HMM and summarize previous studies of SFHMM. There are several approaches to reduce the number of parameters in the previous studies. However, if the number of parameters decreased, the performance of systems also fell. Therefore, we consider the fast computation approach with preserving the same number of parameters. In this paper, we present the new segment comparison method to speed up the computation of SFHMM without loss of performance. The proposed method uses the three-frame calculation rather than the full(five) frames in the given segment. The experimental results show that the performance of the proposed system is better than that of the previous studies.

  • PDF

코사인 변조된 필터 뱅크와 Decimation을 이용한 수렴 속도 성능 개선 (The Convergence Speed Enhancement using a Cosine Modulated Filter Banks and a Decimation Technique)

  • 최창권;조병모
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1999년도 학술발표대회 논문집 제18권 2호
    • /
    • pp.193-196
    • /
    • 1999
  • 본 논문은 음향 임펄스를 모델링하는데 코사인 변조된 필터 뱅크와 Decimation을 이용하여 수렴 속도를 개선하는 방법을 제안하고 이를 잡음제거에 응용하였다. 제안된 구조는 입력신호를 필터뱅크를 이용하여 각 서브밴드로 분할한 후 필터 입력신호의 고유벡터의 최대값과 최소값의 비를 줄이고 필터의 탭수를 줄이기 위해서 decimation을 행한다. 그리고 서브밴드대역의 샘플링 주파수를 낮추어 신호 스펙트럼을 확장시켜 이를 적응필터에 입력하여 수렴속도를 향상시켰다. 실험 결과, Colored잡음의 경우 LMS 알고리즘보다 제안된 방법이 MSE(Mean Square Error)는 좋지는 않았다. 실제 음향시스템의 모델링에는 거의 같은 MSE을 갖으며, 수렴 속도에는 모두 빠른 성능을 보였으며, 이를 음질향상에 적용하여 향상된 음질을 얻을 수 있었다.

  • PDF

An Implementation of Speaker Verification System Based on Continuants and Multilayer Perceptrons

  • Lee, Tae-Seung;Park, Sung-Won;Lim, Sang-Seok;Hwang, Byong-Won
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2003년도 ISIS 2003
    • /
    • pp.216-219
    • /
    • 2003
  • Among the techniques to protect private information by adopting biometrics, speaker verification is expected to be widely used due to advantages in convenient usage and inexpensive implementation cost Speaker verification should achieve a high degree of the reliability in the verification nout the flexibility in speech text usage, and the efficiency in verification system complexity. Continuants have excellent speaker-discriminant power and the modest number of phonemes in the category, and multilayer perceptrons (MLPs) have superior recognition ability and fast operation speed. In consequence, the two provide viable ways for speaker verification system to obtain the above properties. This paper implements a system to which continuants and MLPs are applied, and evaluates the system using a Korean speech database. The results of the experiment prove that continuants and MLPs enable the system to acquire the three properties.

  • PDF

An Acoustical Study of English Word Stress Produced by Americans and Koreans

  • Yang, Byung-Gon
    • 음성과학
    • /
    • 제9권1호
    • /
    • pp.77-88
    • /
    • 2002
  • Acoustical correlates of stress can be classified as duration, intensity and fundamental frequency. This study examined the acoustical differences in the first two syllables of stressed English words produced by ten American and Korean speakers. The Korean subjects scored very high on the TOEFL. They read at a normal speed a fable from which the acoustical parameters of eight words were analyzed. In order to make the data comparison meaningful, each parameter was collected at 100 dynamic time points proportional to the total duration of the two syllables. Then the ratio of the parameter sum of the first rime to that of the second rime was calculated to determine the relative prominence of the syllables. Results showed that the durations of the first two syllables were almost comparable between the Americans and Koreans. However, statistically significant differences showed up in the diphthong pronunciations and in the words with the second syllable stressed. Also, remarkably high r-squared values were found between pairs of the three acoustical parameters, which suggests that either one or a combination of two or more parameters may account for the prominence of a syllable within a word.

  • PDF

An Analysis of the English l Sound Produced by Korean Students

  • Yang, Byung-Gon
    • 음성과학
    • /
    • 제15권1호
    • /
    • pp.53-62
    • /
    • 2008
  • The purpose of this study was to examine the English l sound in an English short story produced by 16 Korean students in order to determine various allophones of the sound using acoustic visual displays and perceptual judgments. The subjects read the story in a quiet office at normal speed. Each word included the lateral sound in onset or coda positions and before a vowel of the following word. Results showed as follows: Firstly, there was a durational difference between the two major groups. Also the majority of the subjects produced the clear l regardless of the contexts. Some students produced the sound as the Korean flap or the English glide [r]. A few missing cases were also seen. The dark l was mostly produced by the subjects of English majors in coda position with a few cases before a vowel in a phrase. Visual displays using the computer analysis were very helpful in distinguishing lateral variants but sometimes perceptual process would be necessary to judge them in fast and weak production of the target word. Further studies would be desirable to test the discrepancies between the acoustical and perceptual decisions.

  • PDF

음성으로부터 감성인식 요소 분석 (Analyzing the element of emotion recognition from speech)

  • 박창현;심재윤;이동욱;심귀보
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2001년도 추계학술대회 학술발표 논문집
    • /
    • pp.199-202
    • /
    • 2001
  • 일반적으로 음성신호로부터 사람의 감정을 인식할 수 있는 요소는 (1)대화의 내용에 사용한 단어, (2)톤 (Tone), (3)음성신호의 피치(Pitch), (4)포만트 주파수(Formant Frequency), 그리고 (5)말의 빠르기(Speech Speed) (6)음질(Voice Quality) 등이다. 사람의 경우는 주파수 같은 분석요소 보다는 론과 단어, 빠르기, 음질로 감정을 받아들이게 되는 것이 자연스러운 방법이므로 당연히 후자의 요소들이 감정을 분류하는데 중요한 인자로 쓰일 수 있다. 그리고, 종래는 주로 후자의 요소들을 이용하였는데, 기계로써 구현하기 위해서는 조금 더 공학적인 포만트 주파수를 사용할 수 있게 되는 것이 도움이 된다. 그러므로, 본 연구는 음성 신호로부터 피치와 포만트, 그리고 말의 빠르기 등을 이용하여 감성 인식시스템을 구현하는 것을 목표로 연구를 진행하고 있는데, 그 1단계 연구로서 본 논문에서는 화가 나서 내뱉는 알과 기쁠 때 간단하게 사용하는 말들을 기반으로 하여 극단적인 두 가지 감정의 독특한 특성을 찾아낸다.

  • PDF

음소인식 오류에 강인한 N-gram 기반 음성 문서 검색 (N-gram Based Robust Spoken Document Retrievals for Phoneme Recognition Errors)

  • 이수장;박경미;오영환
    • 대한음성학회지:말소리
    • /
    • 제67호
    • /
    • pp.149-166
    • /
    • 2008
  • In spoken document retrievals (SDR), subword (typically phonemes) indexing term is used to avoid the out-of-vocabulary (OOV) problem. It makes the indexing and retrieval process independent from any vocabulary. It also requires a small corpus to train the acoustic model. However, subword indexing term approach has a major drawback. It shows higher word error rates than the large vocabulary continuous speech recognition (LVCSR) system. In this paper, we propose an probabilistic slot detection and n-gram based string matching method for phone based spoken document retrievals to overcome high error rates of phone recognizer. Experimental results have shown 9.25% relative improvement in the mean average precision (mAP) with 1.7 times speed up in comparison with the baseline system.

  • PDF