• 제목/요약/키워드: Diphone

검색결과 32건 처리시간 0.031초

Diphone 단위 의 hidden Markov model을 이용한 한국어 단어 인식 (Korean Word Recognition Using Diphone- Level Hidden Markov Model)

  • 박현상;은종관;박용규;권오욱
    • 한국음향학회지
    • /
    • 제13권1호
    • /
    • pp.14-23
    • /
    • 1994
  • 본 논문에서는 한국어 음성인식에 적합한 음성 인식 단위에 대해서 연구하였다. 좋은 음성 인식 시스템을 구현하기 위해서는 발음된 음성내의 조음화현상을 처리할 수 있는 인식단위를 선택해야만 한다. 따라서 음소보다 개념적으로 확대된 인식단위가 필요하게 되는데, diphone은 음소간의 전이영역을 modeling하기때문에 좋은 인식 단위가 될 수 있다. Diphone을 인식 단위로 할 경우에 안정적인 음소영역을 diphone사이에 삽입할 수도 있다. 7명의 남성화자가 발음한 74단어로 구성된 고립단어 인식 실험결과 diphone을 2-state HMM으로, 터짐소리 `ㅂ',`ㄷ','ㄱ'와 묵음을 제외한 음소에 대해서 1-state HMM으로 나타냈을 때 가장 높은 인식률을 보였다. 이때 드물게 발생하는 diphone들을 하나의 단위로 merging했을 때 인식률이 $93.98\%$에서 $96.29\%$로 향상되었다. 또한 merging된 diphone과 제안한 국소보간법 (local interpolation technique)을 사용함으로써 $97.22\%$까지 인식률이 향상되었다.

  • PDF

'Hanmal' Korean Language Diphone Database for Speech Synthesis

  • Chung, Hyun-Song
    • 음성과학
    • /
    • 제12권1호
    • /
    • pp.55-63
    • /
    • 2005
  • This paper introduces a 'Hanmal' Korean language diphone database for speech synthesis, which has been publicly available since 1999 in the MBROLA web site and never been properly published in a journal. The diphone database is compatible with the MBROLA programme of high-quality multilingual speech synthesis systems. The usefulness of the diphone database is introduced in the paper. The paper also describes the phonetic and phonological structure of the database, showing the process of creating a text corpus. A machine-readable Korean SAMPA convention for the control data input to the MBROLA application is also suggested. Diphone concatenation and prosody manipulation are performed using the MBR-PSOLA algorithm. A set of segment duration models can be applied to the diphone synthesis of Korean.

  • PDF

한국어 연결단어의 이음소 인식과 어절 형성에 관한 연구 (A Study on the Diphone Recognition of Korean Connected Words and Eojeol Reconstruction)

  • 김경선;정홍
    • 한국음향학회지
    • /
    • 제14권4호
    • /
    • pp.46-63
    • /
    • 1995
  • 본 논문에서는 시간지연신경망을 이용한 한국어 무제한 어휘 연결단어 인식 시스템에 대해 기술하였다. 인식단위로는 인접한 두음소의 천이과정을 포한하는 이음소 (diphone)를 사용하였으며 그 개수는 329개이다. 한국어 연결단어 인식과정은 음성신호의 특징 추출 과정, 이음소 인식과정과 후처리 과정의 세 단계로 구분된다. 특징 추출 단계에서는 입력 음성의 이음소 구간을 분리하여 16차의 필터밸크 (filter-bank) 계수를 구한다. 이음소 인식은 3단계의 계층적 구조로 이루어졌으며 총 30개의 시간지연신경망을 이용해 이음소를 인식한다. 특히, 사용된 시간지연신경망은 인식률을 높이기 위하여 기존의 시간 지연신경망 구조를 변경하였다. 후처리 단계는 음소 천이확률과 음소 혼동확률을 이용한 이음소 오인식 수정과정과 인식된 이음소를 결합하여 어절을 형성하는 과정으로 이루어진다.

  • PDF

Definition end Function of Two Song Types of the Bush Warbler (Cettia diphone boreoalis)

  • Shi-Ryong Park;Eui-Dong Han;Ha-Cheol Sung
    • Animal cells and systems
    • /
    • 제3권2호
    • /
    • pp.149-151
    • /
    • 1999
  • It has been suggested that the bush warbler (Cettia diphone borealis) uses different song types in various situations. We analyzed song features and conducted playback experiments in order to reveal the function of songs of the bush warbler. Two song types were identified. The short song type has a shorter song duration than that of normal song types and consists of only one or two syllables. Due to its short syllable and low amplitude of the whistle portion, we were able to discriminate the short song type (S song type). from the normal song type (N song Type). In the playback experiments, bush warblers sang high rates of short song type for the first three minutes after playback. After 6 minutes of playback, males changed to singing normal songs. These results suggest that the short song of the bush warbler may function to threaten or drive off intruding males.

  • PDF

DSP를 이용한 가변어휘 음성인식기 구현에 관한 연구 (Implementation of Vocabulary- Independent Speech Recognizer Using a DSP)

  • 정익주
    • 음성과학
    • /
    • 제11권3호
    • /
    • pp.143-156
    • /
    • 2004
  • In this paper, we implemented a vocabulary-independent speech recognizer using the TMS320VC33 DSP. For this implementation, we had developed very small-sized recognition engine based on diphone sub-word unit, which is especially suited for embedded applications where the system resources are severely limited. The recognition accuracy of the developed recognizer with 1 mixture per state and 4 states per diphone is 94.5% when tested on frequently-used 2000 words set. The design of the hardware was focused on minimal use of parts, which results in reduced material cost. The finally developed hardware only includes a DSP, 512 Kword flash ROM and a voice codec. In porting the recognition engine to the DSP, we introduced several methods of using data and program memory efficiently and developed the versatile software protocol for host interface. Finally, we also made an evaluation board for testing the developed hardware recognition module.

  • PDF

휘파람새(Cettia Diphone)개체군간 song 변이의 방향 (A Song Transition among the Geographic Populations of Bush Warbler (Cettia diphone))

  • Park, Dae Sik;Sooil Kim;Shi-Ryong Park
    • The Korean Journal of Ecology
    • /
    • 제19권2호
    • /
    • pp.141-149
    • /
    • 1996
  • This study was to examine the occurrence of geographic song variation and its pattern of transitional direction among bush warbler populations distributed in Korea and Japan, Bush warbler songs (n=283) of 25 males from Cheongwon and Jeju, Korea, and from Chiba, Japan were analyzed. Chiba individuals had more song types and had the higher dominant frequency and longer duration of the introductory whistle portion than Cheongwon and Jeju individuals. In measure of eight song parameters, the parameters constantly showed a decreasing or increasing tendency. The constant tendency showed direction related with the geographic location from Chiba to Cheongwon. The difference in song parameters between Cheongwon and Chiba populations was the greatest in comparison to that of other sets of geographic populations. The degree of discrimination among the three populations was 92.00%. These results indicate that there is a geographic song variation between bush warblers of Japan and Korea, and that the song transition has been directed from Chiba (Japan) through Jeju to Cheongwon (Korea).

  • PDF

Behavioral Function of the Anomalous Song in the Bush Warbler, Cettia diphone

  • Park, Shi-Ryong;Cheong, Seok-Wan;Chung, Hoon
    • Animal cells and systems
    • /
    • 제8권2호
    • /
    • pp.89-95
    • /
    • 2004
  • The bush warblers (Cettia diphone) have been recognized to possess two types of songs: a normal song that plays roles in attracting mate and territorial defense, and an anomalous song. The present study suggests that the anomalous song functions as an alarm signal as well as other unknown signals. Field observations and playback experiments on the anomalous song of bush warbler were conducted in order to investigate the contextual information that occurred between sender and receiver. In the field observation, the males frequently emitted anomalous songs to potential predators. The males responded with an anomalous song to stuffed potential predators. The distance from where the anomalous song occurs to the stimulating source varied depending upon the kinds of stimulus. The males of bush warbler possibly show different responses to the anomalous song depending on the level of danger. When the anomalous song was played back to terrestrial males and females, no distinctive behavior was observed. The anomalous song may be sung to defend the territory against predators or to distract invaders from the nest and female because the male and female behaviors were related with the anomalous song and its phonetic characteristics.

다이폰을 이용한 한국어 문자-음성 변환 시스템의 설계 및 구현 (Design and Implementation of Korean Tet-to-Speech System)

  • 정준구
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 제11회 음성통신 및 신호처리 워크샵 논문집 (SCAS 11권 1호)
    • /
    • pp.91-94
    • /
    • 1994
  • This paper is a study on the design and implementation of the Korean Tet-to-Speech system. In this paper, parameter symthesis method is chosen for speech symthesis method and PARCOR coeffient, one of the LPC analysis, is used as acoustic parameter, We use a diphone as synthesis unit, it include a basic naturalness of human speech. Diphone DB is consisted of 1228 PCM files. LPC synthesis method has defect that decline clearness of synthesis speech, during synthesizing unvoiced sound In this paper, we improve clearness of synthesized speech, using residual signal as ecitation signal of unvoiced sound. Besides, to improve a naturalness, we control the prosody of synthesized speech through controlling the energy and pitch pattern. Synthesis system is implemented at PC/486 and use a 70Hz-4.5KHz band pass filter for speech imput/output, amplifier and TMS320c30 DSP board.

  • PDF

Perceptual Evaluation of Duration Models in Spoken Korean

  • Chung, Hyun-Song
    • 음성과학
    • /
    • 제9권1호
    • /
    • pp.207-215
    • /
    • 2002
  • Perceptual evaluation of duration models of spoken Korean was carried out based on the Classification and Regression Tree (CART) model for text-to-speech conversion. A reference set of durations was produced by a commercial text-to-speech synthesis system for comparison. The duration model which was built in the previous research (Chung & Huckvale, 2001) was applied to a Korean language speech synthesis diphone database, 'Hanmal (HN 1.0)'. The synthetic speech produced by the CART duration model was preferred in the subjective preference test by a small margin and the synthetic speech from the commercial system was superior in the clarity test. In the course of preparing the experiment, a labeled database of spoken Korean with 670 sentences was constructed. As a result of the experiment, a trained duration model for speech synthesis was obtained. The 'Hanmal' diphone database for Korean speech synthesis was also developed as a by-product of the perceptual evaluation.

  • PDF