• Title/Summary/Keyword: synchronous speech

Search Result 23, Processing Time 0.022 seconds

On a study on PSOLA coding technique based on the measurement of formant similarity (포만트 유사도 측정에 의한 PSOLA 음성 부호화에 관한 연구)

  • 나덕수;이희원;김규홍;배명진
    • Proceedings of the IEEK Conference
    • /
    • 1998.06a
    • /
    • pp.607-610
    • /
    • 1998
  • The major objectives of speech coding include high compression ratio for transmission in the band limited channel, high synthesized speech quality in terms of the intelligibility and the naturalness and fast processing speed. In general, speech coding methods are classified into the following three categories: the wavelform coding, the source coding and the hybird coding. In this paper, we proposed a new waveform coding method using PSOLA(pitch-synchronous overlap add) technique. First, we fixed one basic waveform per pitch and measured the formant similarity between basic and neighbor waveform. Second, if the similairy satisfied threshold values, we compress the neighbor waveform per pitch and then store or transmit. When the comparession is about 45%, we obtained about 4 in MOS.

  • PDF

Error Analysis of the Exponential RLS Algorithms Applied to Speech Signal Processing

  • Yoo, Kyung-Yul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.78-85
    • /
    • 1996
  • The set of admissible time-variations in the input signal can be separated into two categories : slow parameter changes and large parameter changes which occur infrequently. A common approach used in the tracking of slowly time-varying parameters is the exponential recursive least-squares(RLS) algorithm. There have been a variety of research works on the error analysis of the exponential RLS algorithm for the slowly time-varying parameters. In this paper, the focus has been given to the error analysis of exponential RLS algorithms for the input data with abrupt property changes. The voiced speech signal is chosen as the principal application. In order to analyze the error performance of the exponential RLS algorithm, deterministic properties of the exponential RLS algorithms is first analyzed for the case of abrupt parameter changes, the impulsive input(or error variance) synchronous to the abrupt change of parameter vectors actually enhances the convergence of the exponential RLS algorithm. The analysis has also been verified through simulations on the synthetic speech signal.

  • PDF

A Study on Pitch Perception of Normal Korean (한국 성인 음성의 음도인식에 관한 연구)

  • Jeong, Ok-Ran;Kim, Hyung-Soon;Kim, Young-Tae;Sub, Jang-Su
    • Speech Sciences
    • /
    • v.1
    • /
    • pp.315-323
    • /
    • 1997
  • This study attempts to determine the fundamental frequency level of male and female voices that Koreans perceive as normal. Seventy-three college students majoring in Speech Pathology participated in the study on a voluntary basis. The subjects listened to a male voice with fundamental frequency of 60 Hz, 80 Hz, 100 Hz, 120 Hz, 140 Hz, 160 Hz, 180 Hz, and 200 Hz, and a female voice with fundamental frequency of 140 Hz, 160 Hz, 180 Hz, 200 Hz, 220 Hz, 240 Hz, 260 Hz, and 280 Hz. The PSOLA (Pitch Synchronous Overlap). method and harmonic modeling method of speech signal were used to change pitch in the 20 Hz interval. The voices were presented in a random order to prevent listener bias. The results were as follows; Firstly, $46.6\%$ judged male voice with 120 Hz as normal, and $19.2\%$ judged 140 Hz as normal, and another $19.2\%$ judged 160 Hz as normal. Secondly, $50.7\%$ perceived female voice with 220 Hz as normal, and $32.9\%\;and\;30.1\%$ responded to 200 Hz and 240 Hz, respectively. The problems and recommendations for a future investigation are discussed.

  • PDF

A Review of Contemporary Teleaudiology: Literature Review, Technology, and Considerations for Practicing

  • Kim, Jinsook;Jeon, Seungik;Kim, Dokyun;Shin, Yerim
    • Korean Journal of Audiology
    • /
    • v.25 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • The scope of teleaudiology has been noted with telehealth due to Coronavirus disease (COVID-19) recently. As the notion has been around us for more than 20 years ever since 1999, it is necessary to perceive the knowledge accurately and prepare for the successful implementation of it. Therefore, the literature review including screening and diagnostic audiometry, cochlear implants and hearing aids, and aural rehabilitation, telecommunications technology regarding several fields of teleaudiology, and considerations for practicing were identified. Although overall internet-based audiological services showed benefits in terms of outcome and accessibility, uncertainties of cost-effectiveness, the optimal level of support, and a need for further studies of many aspects for teleaudiology has arisen. In the view of technology, the store-and-forward (asynchronous/hybrid) and a real-time (synchronous) methods were introduced with one applied and nine registered patents recorded from 2004 to 2020 for the invention of teleaudiology in the United States. Also, 10 checklists were suggested for planning teleaudiology practice from prior experience in hosting the teleaudiology program. Conclusively, it is hoped that this review sheds light on recognizing and improving the existing teleaudiology services and helps overcome the challenges faced in the era of pandemic and untact world to come.

A Review of Contemporary Teleaudiology: Literature Review, Technology, and Considerations for Practicing

  • Kim, Jinsook;Jeon, Seungik;Kim, Dokyun;Shin, Yerim
    • Journal of Audiology & Otology
    • /
    • v.25 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • The scope of teleaudiology has been noted with telehealth due to Coronavirus disease (COVID-19) recently. As the notion has been around us for more than 20 years ever since 1999, it is necessary to perceive the knowledge accurately and prepare for the successful implementation of it. Therefore, the literature review including screening and diagnostic audiometry, cochlear implants and hearing aids, and aural rehabilitation, telecommunications technology regarding several fields of teleaudiology, and considerations for practicing were identified. Although overall internet-based audiological services showed benefits in terms of outcome and accessibility, uncertainties of cost-effectiveness, the optimal level of support, and a need for further studies of many aspects for teleaudiology has arisen. In the view of technology, the store-and-forward (asynchronous/hybrid) and a real-time (synchronous) methods were introduced with one applied and nine registered patents recorded from 2004 to 2020 for the invention of teleaudiology in the United States. Also, 10 checklists were suggested for planning teleaudiology practice from prior experience in hosting the teleaudiology program. Conclusively, it is hoped that this review sheds light on recognizing and improving the existing teleaudiology services and helps overcome the challenges faced in the era of pandemic and untact world to come.

Inter-speaker and intra-speaker variability on sound change in contemporary Korean

  • Kim, Mi-Ryoung
    • Phonetics and Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.25-32
    • /
    • 2017
  • Besides their effect on the f0 contour of the following vowel, Korean stops are undergoing a sound change in which a partial or complete consonantal merger on voice onset time (VOT) is taking place between aspirated and lax stops. Many previous studies on sound change have mainly focused on group-normative effects, that is, effects that are representative of the population as a whole. Few systematic quantitative studies of change in adult individuals have been carried out. The current study examines whether the sound change holds for individual speakers. It focuses on inter-speaker and intra-speaker variability on sound change in contemporary Korean. Speech data were collected for thirteen Seoul Korean speakers studying abroad in America. In order to minimize the possible effects of speech production, socio-phonetic factors such as age, gender, dialect, speech rate, and L2 exposure period were controlled when recruiting participants. The results showed that, for nine out of thirteen speakers, the consonantal merger is taking place between the aspirated and lax stop in terms of VOT. There were also intra-speaker variations on the merger in three aspects: First, is the consonantal (VOT) merger between the two stops is in progress or not? Second, are VOTs for aspirated stops getting shorter or not (i.e., the aspirated-shortening process)? Third, are VOTs for lax stops getting longer or not (i.e., the lax-lengthening process)? The results of remarkable inter-speaker and intra-speaker variability indicate a synchronous speech sound change of the stop system in contemporary Korean. Some speakers are early adopters or active propagators of sound change whereas others are not. Further study is necessary to see whether the inter-speaker differences exceed intra-speaker differences in sound change.

A Korean Flight Reservation System Using Continuous Speech Recognition

  • Choi, Jong-Ryong;Kim, Bum-Koog;Chung, Hyun-Yeol;Nakagawa, Seiichi
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.3E
    • /
    • pp.60-65
    • /
    • 1996
  • This paper describes on the Korean continuous speech recognition system for flight reservation. It adopts a frame-synchronous One-Pass DP search algorithm driven by syntactic constraints of context free grammar(CFG). For recognition, 48 phoneme-like units(PLU) were defined and used as basic units for acoustic modeling of Korean. This modeling was conducted using a HMM technique, where each model has 4-states 3-continuous output probability distributions and 3-discrete-duration distributions. Language modeling by CFG was also applied to the task domain of flight reservation, which consisted of 346 words and 422 rewriting rules. In the tests, the sentence recognition rate of 62.6% was obtained after speaker adaptation.

  • PDF

A 4 kbps PSI-VSELP Speech Coding Algorithm (4 kbps PSI-VSELP 음성 부호화 알고리듬)

  • Choi, Yong-Soo;Kang, Hong-Goo;Park, Sang-Wook;Youn, Dae-Hee
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.6
    • /
    • pp.59-65
    • /
    • 1996
  • This paper proposes a 4 kbps PSI-VSELP(Pitch Synchronous Innovation-Vector Sum Excited Linear Prediction) speech coder which produces speech equivalent to that of the conventional 4.8 kbps VSELP. Since the 'half-rate' is differently defined from country to country, there may be a need to reduce the bit rate of conventional half-rate coder. To minimize the degradation of speech quality caused by bit-rate reduction, it is desirable to perform bit-allocation based on the carefull consideration of the effect of various transmission parameters. This paper adopts this analytical approach for bit-allocation at 4 kbps. To improve the quality of the VSELP coder at 4 kbps, basis vectors which play the most important role in the performance, are optimized by an iterative closed-loop training process and the PSI technique is employed in the VSELP performance, are optimized by an iterative closed-loop training process and the PSI technique is employed in the VSELP coder. To demonstrate the performance of the proposed speech coder, we peformed experiments under the noiseless and error free conditions. From experimental results, even though the proposed 4 kbps PSI-VSELP coder showed lower scores in the objective measure, higher scores in subjective measure was obtained compared with those of the conventional 4.8 kbps VSELp.

  • PDF

Coordinative movement of articulators in bilabial stop /p/

  • Son, Minjung
    • Phonetics and Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.77-89
    • /
    • 2018
  • Speech articulators are coordinated for the purpose of segmental constriction in terms of a task. In particular, vertical jaw movements repeatedly contribute to consonantal as well as vocalic constriction. The current study explores vertical jaw movements in conjunction with bilabial constriction in bilabial stop /p/ in the context /a/-to-/a/. Revisiting kinematic data of /p/ collected using the electromagenetic midsagittal articulometer (EMMA) method from seven (four female and three male) speakers of Seoul Korean, we examined maximum vertical jaw position, its relative timing with respect to the upper and lower lips, and lip aperture minima. The results of those dependent variables are recapitulated in terms of linguistic (different word boundaries) and paralinguistic (different speech rates) factors as follows. Firstly, maximum jaw height was lower in the across-word boundary condition (across-word < within-word), but it did not differ as a function of different speech rates (comfortable = fast). Secondly, more reduction in the lip aperture (LA) gesture occurred in fast rate, while word-boundary effects were absent. Thirdly, jaw raising was still in progress after the lips' positional extrema were achieved in the within-word condition, while the former was completed before the latter in the across-word condition. Lastly, relative temporal lags between the jaw and the lips (UL and LL) were more synchronous in fast rate, compared to comfortable rate. When these results are considered together, it is possible to posit that speakers are not tolerant of lenition to the extent that it is potentially realized as a labial approximant in either word-boundary condition while jaw height still manifested lower jaw position in the across-word boundary condition. Early termination of vertical jaw maxima before vertical lower lip maxima across-word condition may be partly responsible for the spatial reduction of jaw raising movements. This may come about as a consequence of an excessive number of factors (e.g., upper lip height (UH), lower lip height (LH), jaw angle (JA)) for the representation of a vector with two degrees of freedom (x, y) engaged in a gesture-based task (e.g., lip aperture (LA)). In the task-dynamic application toolkit, the jaw angle parameter can be assigned numerical values for greater weight in the across-word boundary condition, which in turn gives rise to lower jaw position. Speech rate-dependent spatial reduction in lip aperture may be able to be resolved by means of manipulating activation time of an active tract variable in the gestural score level.

Automatic Speaker Identification by Sustained Vowel Phonation (지속적으로 발성한 모음에 의한 화자인식)

  • Bae, Geon-Seong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.11 no.1
    • /
    • pp.35-41
    • /
    • 1992
  • A speaker identification scheme using the speaker-based VQ codecook of a sustained vowel is proposed and tested. With the pitch synchronous LPC vector of the sustained vowel /i/ as a feature vector, a VQ codebook size of 4 was found to be suitable to characterize each speaker's feature space. For 40 normal speakers (20 males, 20 females), we achieved the correct identification rate of 99.4% with a training data set, and 89.4% with a test data set with speech samples of only 50 pitch periods.

  • PDF