• Title/Summary/Keyword: Speech Recording

Search Result 97, Processing Time 0.023 seconds

Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance (음성인식에서 입술 파라미터 열화에 따른 견인성 연구)

  • Kim, Jin-Young;Min, So-Hee;Choi, Seung-Ho
    • Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.27-33
    • /
    • 2003
  • Bimodal speech recognition based on lip reading has been studied as a representative method of speech recognition under noisy environments. There are three integration methods of speech and lip modalities as like direct identification, separate identification and dominant recording. In this paper we evaluate the robustness of lip reading methods under the assumption that lip parameters are estimated with errors. We show that the dominant recording approach is more robust than other methods through lip reading experiments.

  • PDF

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • Phonetics and Speech Sciences
    • /
    • v.13 no.3
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

Comparison of Speaker's Source Characteristics in Different Recording Environments by Using Phonation Type Index k (녹음 환경의 차이에 따른 화자의 음원 특성 비교: 발성유형지수 k를 중심으로)

  • Lee, Hoo-Dong;Kang, Sun-Mee;Park, Han-Sang;Chang, Moon-Soo
    • Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.213-224
    • /
    • 2003
  • Spoken sound includes not only speaker's source but the characteristics of vocal tract and speech radiation. This paper is based on the theory of Park[1], who proposes the Phonation Type Index k; a variable that shows the characteristic of speaker's source excluding those of speaker's vocal tract and speech radiation. With Park's theory, we collect data by changing recording environments and expanding experimental data, and analyze the data collected to see whether or not the PTI k shows good discriminating power as a variable for speaker recognition. In the experiment, we repeatedly record 8 sentences ten times for each of 5 males in the environment of a recording room and an office, extract PTI k for each speaker, and measure the discriminating power for each speaker by using the value of PTI k. The result shows that PTI k has the excellent discriminating power of speakers. We also confirm that, even if the recording environment is changed, PTI k shows similar results.

  • PDF

Research on Construction of the Korean Speech Corpus in Patient with Velopharyngeal Insufficiency (구개인두부전증 환자의 한국어 음성 코퍼스 구축 방안 연구)

  • Lee, Ji-Eun;Kim, Wook-Eun;Kim, Kwang Hyun;Sung, Myung-Whun;Kwon, Tack-Kyun
    • Korean Journal of Otorhinolaryngology-Head and Neck Surgery
    • /
    • v.55 no.8
    • /
    • pp.498-507
    • /
    • 2012
  • Background and Objectives We aimed to develop a Korean version of the velopharyngeal insufficiency (VPI) speech corpus system. Subjects and Method After developing a 3-channel simultaneous speech recording device capable of recording nasal/oral and normal compound speech separately, voice data were collected from VPI patients aged more than 10 years with/without the history of operation or prior speech therapy. This was compared to a control group for which VPI was simulated by using a french-3 nelaton tube inserted via both nostril through nasopharynx and pulling the soft palate anteriorly in varying degrees. The study consisted of three transcriptors: a speech therapist transcribed the voice file into text, a second transcriptor graded speech intelligibility and severity and the third tagged the types and onset times of misarticulation. The database were composed of three main tables regarding (1) speaker's demographics, (2) condition of the recording system and (3) transcripts. All of these were interfaced with the Praat voice analysis program, which enables the user to extract exact transcribed phrases for analysis. Results In the simulated VPI group, the higher the severity of VPI, the higher the nasalance score was obtained. In addition, we could verify the vocal energy that characterizes hypernasality and compensation in nasal/oral and compound sounds spoken by VPI patients as opposed to that characgerizes the normal control group. Conclusion With the Korean version of VPI speech corpus system, patients' common difficulties and speech tendencies in articulation can be objectively evaluated. Comparing these data with those of the normal voice, mispronunciation and dysarticulation of patients with VPI can be corrected.

A comparative analysis of metadata structures and attributes of Samsung smartphone voice recording files for forensic use (법과학적 활용을 위한 삼성 스마트폰 음성 녹음 파일의 메타데이터 구조 및 속성 비교 분석 연구)

  • Ahn, Seo-Yeong;Ryu, Se-Hui;Kim, Kyung-Wha;Hong, Ki-Hyung
    • Phonetics and Speech Sciences
    • /
    • v.14 no.3
    • /
    • pp.103-112
    • /
    • 2022
  • Due to the popularization of smartphones, most of the recorded speech files submitted as evidence of recent crimes are produced by smartphones, and the integrity (forgery) of the submitted speech files based on smartphones is emerging as a major issue in the investigation and trial process. Samsung smartphones with the highest domestic market share are distributed with built-in speech recording applications that can record calls and voice, and can edit recorded speech. Unlike editing through third-party speech (audio) applications, editing by their own builtin speech applications has a high similarity to the original file in metadata structures and attributes, so more precise analysis techniques need to prove integrity. In this study, we constructed a speech file metadata database for speech files (original files) recorded by 34 Samsung smartphones and edited speech files edited by their built-in speech recording applications. We analyzed by comparing the metadata structures and attributes of the original files to their edited ones. As a result, we found significant metadata differences between the original speech files and the edited ones.

Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance (음성인식에서 입술 파라미터 열화에 따른 견인성 연구)

  • Kim Jinyoung;Shin Dosung;Choi Seungho
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.205-208
    • /
    • 2002
  • Bimodal speech recognition based on lip reading has been studied as a representative method of speech recognition under noisy environments. There are three integration methods of speech and lip modalities as like direct identification, separate identification and dominant recording. In this paper we evaluate the robustness of lip reading methods under the assumption that lip parameters are estimated with errors. We show that the dominant recording approach is more robust than other methods with lip reading experiments. Also, a measure of lip parameter degradation is proposed. This measure can be used in the determination of weighting values of video information.

  • PDF

Development and Evaluation of an English Speaking Task Using Smartphone and Text-to-Speech (스마트폰과 음성합성을 활용한 영어 말하기 과제의 개발과 평가)

  • Moon, Dosik
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.5
    • /
    • pp.13-20
    • /
    • 2016
  • This study explores the effects of an video-recording English speaking task model on learners. The learning model, a form of mobile learning, was developed to facilitate the learners' output practice applying advantages of a smartphone and Text-to Speech. The survey results shows the positive effects of the speaking task on the domain of pronunciation, speaking, listening, writing in terms of students' confidence, as well as general English ability. The study further examines the possibilities and limitations of the speaking task in assisting Korean learners improve their speaking ability, who do not have sufficient exposure to English input or output practice due to the situational limitations where English is learned as a foreign language.

Designing of Speech DB for Korean Pronunciation Education (한국어 발음 교육을 위한 음성 DB 구축 방안)

  • Jung Myungsook
    • MALSORI
    • /
    • no.47
    • /
    • pp.51-72
    • /
    • 2003
  • The purpose of this paper is to design Speech Database for Korean pronunciation education. For this purpose, I investigated types of speech errors of Korean-learners, made texts for recording, which involves all types of speech errors, and showed how to gather speech data and how to tag their informations. It's natural that speech data should include Korean-learners' speech and Korean people's speech, because Speech DB that I try to develop is for teaching Korean pronunciation to foreigners. So this DB should have informations about speakers and phonetic informations, which are about phonetic value of segments and intonation of sentences. The intonation of sentence varies with the type of sentence, the structure of prosodic units, the length of a prosodic unit and so on. For this reason, Speech DB must involve tags about these informations.

  • PDF

On the Role of the Phatic Function of Intonation in Russian (러시아어 발화시 억양의 역할)

  • Park, Kun-Woo
    • Speech Sciences
    • /
    • v.4 no.1
    • /
    • pp.81-89
    • /
    • 1998
  • This paper investigates the phatic function of intonation in Russian by recording and analysing 11 female native speakers of standard Moscow Russian. This paper shows that differences in intonation pattern of a sentence are associated with differences in degree of listener's involvement in the speech. Intonation pattern of an utterance having phatic function appears to be determined by 1) the speaker's readiness to talk to evoke the listener's attention ; 2) the speaker's intention to continue the communication. Some emphasis is placed on the relationship between intonation pattern of an utterance and speaker-listener interaction.

  • PDF

Common Speech Database Collection and Validation for Communications (한국어 공통 음성 DB구축 및 오류 검증)

  • Lee Soo-jong;Kim Sanghun;Lee Youngjik
    • MALSORI
    • /
    • no.46
    • /
    • pp.145-157
    • /
    • 2003
  • In this paper, we'd like to briefly introduce Korean common speech database, which project has been started to construct a large scaled speech database since 2002. The project aims at supporting the R&D environment of the speech technology for industries. It encourages domestic speech industries and activates speech technology domestic market. In the first year, the resulting common speech database consists of 25 kinds of databases considering various recording conditions such as telephone, PC, VoIP etc. The speech database will be widely used for speech recognition, speech synthesis, and speaker identification. On the other hand, although the database was originally corrected by manual, still it retains unknown errors and human errors. So, in order to minimize the errors in the database, we tried to find the errors based on the recognition errors and classify several kinds of errors. To be more effective than typical recognition technique, we will develop the automatic error detection method. In the future, we will try to construct new databases reflecting the needs of companies and universities.

  • PDF