• Title/Summary/Keyword: speech features

Search Result 647, Processing Time 0.031 seconds

An Acoustic Study of Prosodic Features of Korean Spoken Language and Korean Folk Song (Minyo) (언어와 민요의 운율 자질에 관한 음향음성학적 연구)

  • Koo, Hee-San
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.133-144
    • /
    • 2003
  • The purpose of this acoustic experimental study was to investigate interrelation between prosodic features of Korean spoken language and those of Korean folk songs. The words of Changbutaryoung were spoken for analysis of spoken language by three female graduate students and the song was sung for musical features by three Kyunggi Minyo singers. Pitch contours were analyzed from sound spectrogram made by Pitch Works. Results showed that special musical voices (breaking, tinkling, vibrating, etc.) and tunes (rising, falling, level, etc) of folk song were discovered at the same place where accents of spoken language came. It appeared that, even though the patterns of pitch contour were different from each other, there was positive interrelation between prosodic features of Korean spoken language and those of Korean folk songs.

  • PDF

Quality Improvement of Bandwidth Extended Speech Using Mixed Excitation Model (혼합여기모델을 이용한 대역 확장된 음성신호의 음질 개선)

  • Choi Mu Yeol;Kim Hyung Soon
    • MALSORI
    • /
    • no.52
    • /
    • pp.133-144
    • /
    • 2004
  • The quality of narrowband speech can be enhanced by the bandwidth extension technology. This paper proposes a mixed excitation and an energy compensation method based on Gaussian Mixture Model (GMM). First, we employ the mixed excitation model having both periodic and aperiodic characteristics in frequency domain. We use a filter bank to extract the periodicity features from the filtered signals and model them based on GMM to estimate the mixed excitation. Second, we separate the acoustic space into the voiced and unvoiced parts of speech to compensate for the energy difference between narrowband speech and reconstructed highband, or lowband speech, more accurately. Objective and subjective evaluations show that the quality of wideband speech reconstructed by the proposed method is superior to that by the conventional bandwidth extension method.

  • PDF

An Encrypted Speech Retrieval Scheme Based on Long Short-Term Memory Neural Network and Deep Hashing

  • Zhang, Qiu-yu;Li, Yu-zhou;Hu, Ying-jie
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.6
    • /
    • pp.2612-2633
    • /
    • 2020
  • Due to the explosive growth of multimedia speech data, how to protect the privacy of speech data and how to efficiently retrieve speech data have become a hot spot for researchers in recent years. In this paper, we proposed an encrypted speech retrieval scheme based on long short-term memory (LSTM) neural network and deep hashing. This scheme not only achieves efficient retrieval of massive speech in cloud environment, but also effectively avoids the risk of sensitive information leakage. Firstly, a novel speech encryption algorithm based on 4D quadratic autonomous hyperchaotic system is proposed to realize the privacy and security of speech data in the cloud. Secondly, the integrated LSTM network model and deep hashing algorithm are used to extract high-level features of speech data. It is used to solve the high dimensional and temporality problems of speech data, and increase the retrieval efficiency and retrieval accuracy of the proposed scheme. Finally, the normalized Hamming distance algorithm is used to achieve matching. Compared with the existing algorithms, the proposed scheme has good discrimination and robustness and it has high recall, precision and retrieval efficiency under various content preserving operations. Meanwhile, the proposed speech encryption algorithm has high key space and can effectively resist exhaustive attacks.

Acoustic Characteristics of Korean Stops in Korean Child-directed Speech (한국어 아동 지향어에 나타난 폐쇄음의 음향 음성학적 특성)

  • Kim, Min-Jung
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.117-122
    • /
    • 2009
  • A variety of cross-linguistic studies has documented that the acoustic properties of speech addressed to young children include exaggeration of pitch contours and acoustically salient features of phonetic units. It has been suggested that phonetic modifications of child-directed speech facilitate young children's learning of speech sounds by providing detailed phonetic information about the target word. While there are several studies reporting vowel modifications in speech to infants (i.e., hyper-articulated vowels), there has been little research about consonant modifications in speech to young children (except for VOT). The present study examines acoustic properties of Korean stops in Korean mothers' speech to their children (seven children aged 27 to 38 months). Korean tense, lax, and aspirated stops are all voiceless in word-initial position, and are perceptually differentiated by several acoustic parameters including VOT, $f_0$ of the following vowel, and the amplitude difference of the first and second harmonics at the voice onset of the following vowel. This study compares values of these parameters in Korean child-directed speech to those in adult-directed speech from same speakers. Conclusions focus on the acoustic properties of Korean stops in child-directed speech and how they are modified to help Korean young children learn the three-way phonetic contrast.

  • PDF

A study on skip-connection with time-frequency self-attention for improving speech enhancement based on complex-valued spectrum (복소 스펙트럼 기반 음성 향상의 성능 향상을 위한 time-frequency self-attention 기반 skip-connection 기법 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.94-101
    • /
    • 2023
  • A deep neural network composed of encoders and decoders, such as U-Net, used for speech enhancement, concatenates the encoder to the decoder through skip-connection. Skip-connection helps reconstruct the enhanced spectrum and complement the lost information. The features of the encoder and the decoder connected by the skip-connection are incompatible with each other. In this paper, for complex-valued spectrum based speech enhancement, Self-Attention (SA) method is applied to skip-connection to transform the feature of encoder to be compatible with the features of decoder. SA is a technique in which when generating an output sequence in a sequence-to-sequence tasks the weighted average of input is used to put attention on subsets of input, showing that noise can be effectively eliminated by being applied in speech enhancement. The three models using encoder and decoder features to apply SA to skip-connection are studied. As experimental results using TIMIT database, the proposed methods show improvements in all evaluation metrics compared to the Deep Complex U-Net (DCUNET) with skip-connection only.

Robust Feature Extraction for Voice Activity Detection in Nonstationary Noisy Environments (음성구간검출을 위한 비정상성 잡음에 강인한 특징 추출)

  • Hong, Jungpyo;Park, Sangjun;Jeong, Sangbae;Hahn, Minsoo
    • Phonetics and Speech Sciences
    • /
    • v.5 no.1
    • /
    • pp.11-16
    • /
    • 2013
  • This paper proposes robust feature extraction for accurate voice activity detection (VAD). VAD is one of the principal modules for speech signal processing such as speech codec, speech enhancement, and speech recognition. Noisy environments contain nonstationary noises causing the accuracy of the VAD to drastically decline because the fluctuation of features in the noise intervals results in increased false alarm rates. In this paper, in order to improve the VAD performance, harmonic-weighted energy is proposed. This feature extraction method focuses on voiced speech intervals and weighted harmonic-to-noise ratios to determine the amount of the harmonicity to frame energy. For performance evaluation, the receiver operating characteristic curves and equal error rate are measured.

Modality-Based Sentence-Final Intonation Prediction for Korean Conversational-Style Text-to-Speech Systems

  • Oh, Seung-Shin;Kim, Sang-Hun
    • ETRI Journal
    • /
    • v.28 no.6
    • /
    • pp.807-810
    • /
    • 2006
  • This letter presents a prediction model for sentence-final intonations for Korean conversational-style text-to-speech systems in which we introduce the linguistic feature of 'modality' as a new parameter. Based on their function and meaning, we classify tonal forms in speech data into tone types meaningful for speech synthesis and use the result of this classification to build our prediction model using a tree structured classification algorithm. In order to show that modality is more effective for the prediction model than features such as sentence type or speech act, an experiment is performed on a test set of 970 utterances with a training set of 3,883 utterances. The results show that modality makes a higher contribution to the determination of sentence-final intonation than sentence type or speech act, and that prediction accuracy improves up to 25% when the feature of modality is introduced.

  • PDF

Implementation of Hidden Markov Model based Speech Recognition System for Teaching Autonomous Mobile Robot (자율이동로봇의 명령 교시를 위한 HMM 기반 음성인식시스템의 구현)

  • 조현수;박민규;이민철
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2000.10a
    • /
    • pp.281-281
    • /
    • 2000
  • This paper presents an implementation of speech recognition system for teaching an autonomous mobile robot. The use of human speech as the teaching method provides more convenient user-interface for the mobile robot. In this study, for easily teaching the mobile robot, a study on the autonomous mobile robot with the function of speech recognition is tried. In speech recognition system, a speech recognition algorithm using HMM(Hidden Markov Model) is presented to recognize Korean word. Filter-bank analysis model is used to extract of features as the spectral analysis method. A recognized word is converted to command for the control of robot navigation.

  • PDF

SPEECH SYNTHESIS USING LARGE SPEECH DATA-BASE

  • Lee, Kyu-Keon;Mochida, Takemi;Sakurai, Naohiro;Shirai, Katasuhiko
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.949-956
    • /
    • 1994
  • In this paper, we introduce a new speech synthesis method for Japanese and Korean arbitrary sentences using the natural speech data-base. Also, application of this method to a CAI system is discussed. In our synthesis method, a basic sentence and basic accent-phrases are selected from the data-base against a target sentence. Factors for those selections are phrase dependency structure (separation degree), number of morae, type of accent and phonemic labels. The target pitch pattern and phonemic parameter series are generated using those selected basic units. As the pitch pattern is generated using patterns which are directly extracted form real speech, it is expected to be more natural than any other pattern which is estimated by any model. Until now, we have examined this method on Japanese sentence speech and affirmed that the synthetic sound preserves human-like features fairly well. Now we extend this method to Korean sentence speech synthesis. Further more, we are trying to apply this synthesis unit to a CAI system.

  • PDF

Auditory-Perceptual Variables of Speech Evaluation in Dysarthria Literature (마비말장애 연구문헌에서 살펴본 말평가의 청지각적 요소)

  • Suh, Mee-Kyung;Kim, Hyang-Hee
    • Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.197-206
    • /
    • 2006
  • Perceptual judgement method is frequently used in evaluating dysarthric speech. Although most of speech pathologists and researchers focus on the 38 perceptual features provided by Darley, Aronson & Brown(1969) during evaluation, there are additional characteristics that may be useful to describe dysarthria in literature. We reviewed previous dysarthria literature and selected 46 perceptual characteristics that could be examined at various subsystems of speech production. We also provided explanations and rationale for the rating method for each of the perceptual characteristics. This attempt might aid to offer a basic ground for developing a diagnostic tool of dysarthria.

  • PDF