• Title/Summary/Keyword: phonetic variation

Search Result 61, Processing Time 0.029 seconds

Noise-Robust Speech Detection Using The Coefficient of Variation of Spectrum (스펙트럼의 변동계수를 이용한 잡음에 강인한 음성 구간 검출)

  • Kim Youngmin;Hahn Minsoo
    • MALSORI
    • /
    • no.48
    • /
    • pp.107-116
    • /
    • 2003
  • This paper deals with a new parameter for voice detection which is used for many areas of speech engineering such as speech synthesis, speech recognition and speech coding. CV (Coefficient of Variation) of speech spectrum as well as other feature parameters is used for the detection of speech. CV is calculated only in the specific range of speech spectrum. Average magnitude and spectral magnitude are also employed to improve the performance of detector. From the experimental results the proposed voice detector outperformed the conventional energy-based detector in the sense of error measurements.

  • PDF

Improvement of MP3-Based Music Summarization Using Linear Regression (선형 근사를 통한 MP3 음악 요약의 성능 향상)

  • Koh, Seo-Young;Park, Jeong-Sik;Oh, Yung-hwan
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.55-58
    • /
    • 2005
  • Music Summarization is to extract there presentative section of a song such as chorus or motif. In previous work, the length of music summarization was fixed, and the threshold to determine the chorus section was so sensitive that the tuning was needed. Also, the rapid change of rhythm or variation of sound effects make the chorus extraction errors. We suggest the linear regression for extracting the changeable length and for minimizing the effects of threshold variation. The experimental result shows that proposed method outperforms conventional one.

  • PDF

Analyzing vowel variation in Korean dialects using phone recognition

  • Jooyoung Lee;Sunhee Kim;Minhwa Chung
    • Phonetics and Speech Sciences
    • /
    • v.15 no.4
    • /
    • pp.101-107
    • /
    • 2023
  • This study aims to propose an automatic method of detecting vowel variation in the Korean dialects of Gyeong-sang and Jeol-la. The method is based on error patterns extracted using phone recognition. Canonical and recognized phone sequences are compared, and statistical analyses distinguish the vowels appearing in both dialects, the dialect-common vowels, and the vowels with high mismatch rates for each dialect. The dialect-common vowels show monophthongization of diphthongs. The vowels unique to the dialects are /we/ to [e] and /ʌ/ to [ɰ] for Gyeong-sang dialect, and /ɰi/ to [ɯ] in Jeol-la dialect. These results corroborate previous dialectology reports regarding phonetic realization of the Korean dialects. The current method provides a possibility of automatic explanation of the dialect patterns.

Classification of Diphthongs using Acoustic Phonetic Parameters (음향음성학 파라메터를 이용한 이중모음의 분류)

  • Lee, Suk-Myung;Choi, Jeung-Yoon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.2
    • /
    • pp.167-173
    • /
    • 2013
  • This work examines classification of diphthongs, as part of a distinctive feature-based speech recognition system. Acoustic measurements related to the vocal tract and the voice source are examined, and analysis of variance (ANOVA) results show that vowel duration, energy trajectory, and formant variation are significant. A balanced error rate of 17.8% is obtained for 2-way diphthong classification on the TIMIT database, and error rates of 32.9%, 29.9%, and 20.2% are obtained for /aw/, /ay/, and /oy/, for 4-way classification, respectively. Adding the acoustic features to widely used Mel-frequency cepstral coefficients also improves classification.

A Study On the Proportional Difference of Segments in Imitating Voice (모방발화에 나타나는 분절음의 비율연구)

  • Park, Ji-Hye;Shin, Ji-Young;Kang, Sun-Mee
    • Proceedings of the KSPS conference
    • /
    • 2004.05a
    • /
    • pp.205-208
    • /
    • 2004
  • The aim of this study is to analyse the adjustment of the proportion of segment duration in imitating voice. When imitating others' voices, how far is his/her original proportion of segment duration adjusted, and what is this adjustment like under various segments? In this study, I classified segments into consonants and vowels and consonants classified into obstruents and sonorants. The result of the analysis is as follows. ; (1)Individual variation in the proportion of obstruent is not significant, and when imitating, and its distribution is not typicalized. (2) Vowels has individual variation in the proportion of segment duration even under imitating. (3) Nasal has the most distinct individual variation even under imitating, compared with vowel and obstruent. For the further study, I should examine the characteristics of quantitative and qualitative changes in liquid (among sonorant) to find out which segment can best describe personnel characteristics of the proportion of segment duration in imitating voice.

  • PDF

Performance Improvement of Connected Digit Recognition with Channel Compensation Method for Telephone speech (채널보상기법을 사용한 전화 음성 연속숫자음의 인식 성능향상)

  • Kim Min Sung;Jung Sung Yun;Son Jong Mok;Bae Keun Sung
    • MALSORI
    • /
    • no.44
    • /
    • pp.73-82
    • /
    • 2002
  • Channel distortion degrades the performance of speech recognizer in telephone environment. It mainly results from the bandwidth limitation and variation of transmission channel. Variation of channel characteristics is usually represented as baseline shift in the cepstrum domain. Thus undesirable effect of the channel variation can be removed by subtracting the mean from the cepstrum. In this paper, to improve the recognition performance of Korea connected digit telephone speech, channel compensation methods such as CMN (Cepstral Mean Normalization), RTCN (Real Time Cepatral Normalization), MCMN (Modified CMN) and MRTCN (Modified RTCN) are applied to the static MFCC. Both MCMN and MRTCN are obtained from the CMN and RTCN, respectively, using variance normalization in the cepstrum domain. Using HTK v3.1 system, recognition experiments are performed for Korean connected digit telephone speech database released by SITEC (Speech Information Technology & Industry Promotion Center). Experiments have shown that MRTCN gives the best result with recognition rate of 90.11% for connected digit. This corresponds to the performance improvement over MFCC alone by 1.72%, i.e, error reduction rate of 14.82%.

  • PDF

Design and Implementation of Vocal Sound Variation Rules for Korean Language (한국어 음운 변동 처리 규칙의 설계 및 구현)

  • Lee, Gye-Young
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.3
    • /
    • pp.851-861
    • /
    • 1998
  • Korean language is to be characterized by the rich vocal sound variation. In order to increase the probability of vocal sound recognition and to provide a natural vocal sound synthesis, a systematic and thorough research into the characteristics of Korean language including its vocal sound changing rules is required. This paper addresses an effective way of vocal sound recognition and synthesis by providing the design and implementation of the Korean vocal sound variation rule. The regulation we followed for the design of the vocal sound variation rule is the Phonetic Standard(Section 30. Chapter 7) of the Korean Orthographic Standards. We have first factor out rules for each regulations, then grouped them into 27 groups for eaeh final-consonant. The Phonological Change Processing System suggested in the paper provides a fast processing ability for vocal sound variation by a single application of the rule. The contents of the process for information augmented to words or the stem of innected words are included in the rules. We believe that the Phonological Change Processing System will facilitate the vocal sound recognition and synthesis by the sentence. Also, this system may be referred as an example for similar research areas.

  • PDF

A Study on Vowel Formant Variation by Vocal Tract Modification (성도 변형에 따른 모음 포먼트의 변화 고찰)

  • Yang, Byung-Gon
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.83-92
    • /
    • 1998
  • Vowels are classified by vocal tract shapes. These shapes form constriction points along the tract, which have an influence on such vocal tract resonance as $F_l,\;F_2,\;F_3$, and so on. This study reviews the perturbation theory of the tract and determines the corresponding formant frequencies from modified vocal tracts using vocal tract area function. Then, formant variation is observed from the theory. Finally, each set of $F_l,\;F_2,\;and\;F_3$ frequency is input to a speech synthesis software to make a vowel sound. Auditory impression of each sound without any modification of its vocal tract shape is almost the same as the corresponding phonetic symbol. Formant frequencies of $F_l,\;F_2,\;F_3$ vary according to the perturbation theory. Generally, constriction along the node causes formant values to decrease while constriction along the anti-node cause it to increase. Vocal tracts modified by more than $3\;cm^2$ change vowel qualities of /a/ and /i/ into those of f /v/ and /$\varepsilon$/, respectively. This study will be helpful in simulating sounds from modified vocal tracts before any operation. Further studies are desirable to compare vocal tract shapes of various languages and their sounds together.

  • PDF

The Vowel System of American English and Its Regional Variation (미국 영어 모음 체계의 몇 가지 지역 방언적 차이)

  • Oh, Eun-Jin
    • Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.69-87
    • /
    • 2006
  • This study aims to describe the vowel system of present-day American English and to discuss some of its phonetic variations due to regional differences. Fifteen speakers of American English from various regions of the United States produced the monophthongs of English. The vowel duration and the frequencies of the first and the second formant were measured. The results indicate that the distinction between the vowels [c] and [a] has been merged in most parts of the U.S. except in some speakers from eastern and southeastern parts of the U.S., resulting in the general loss of phonemic distinction between the vowels. The phonemic merger of the two vowels can be interpreted as the result of the relatively small functional load of the [c]-[a] contrast, and the smaller back vowel space in comparison to the front vowel space. The study also shows that the F2 frequencies of the high back vowel [u] were extremely high in most of the speakers from the eastern region of the U.S., resulting in the overall reduction of their acoustic space for high vowels. From the viewpoint of the Adaptive Dispersion Theory proposed by Liljencrants & Lindblom (1972) and Lindblom (1986), the high back vowel [u] appeared to have been fronted in order to satisfy the economy of articulatory gesture to some extent without blurring any contrast between [i] and [u] in the high vowel region.

  • PDF

An Acoustical Study of Korean Diphthongs (한국어 이중모음의 음향학적 연구)

  • Yang Byeong-Gon
    • MALSORI
    • /
    • no.25_26
    • /
    • pp.3-26
    • /
    • 1993
  • The goals of the present study were (3) to collect and analyze sets of fundamental frequency (F0) and formant frequency (F1, F2, F3) data of Korean diphthongs from ten linguistically homogeneous speakers of Korean males, and (2) to make a comparative study of Korean monophthongs and diphthongs. Various definitions, kinds, and previous studies of diphthongs were examined in the introduction. Procedures for screening subjects to form a linguistically homogeneous group, time point selection and formant determination were explained in the following section. The principal findings were as follows: 1. Much variation was observed in the ongliding part of diphthongs. 2. F2 values of (j) group descended while those of [w] group ascended, 3. The average duration of diphthongs were about 110 msec, and there was not much variation between speakers and diphthongs. 4. In a comparative study of monophthongs and diphthongs, Fl and F2 values of the same offgliding part at the third time point almost converged. 5. The gliding of diphthongs was very short beginning from the h-noise. Perceptual studies using speech synthesis are desirable to find major parameters for diphthongs. The results of the present study wi11 be useful in the area of automated speech recognition and computer synthesis of speech.

  • PDF