• 제목/요약/키워드: Speech Recording

검색결과 97건 처리시간 0.017초

음성인식에서 입술 파라미터 열화에 따른 견인성 연구 (Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance)

  • 김진영;민소희;최승호
    • 음성과학
    • /
    • 제10권2호
    • /
    • pp.27-33
    • /
    • 2003
  • Bimodal speech recognition based on lip reading has been studied as a representative method of speech recognition under noisy environments. There are three integration methods of speech and lip modalities as like direct identification, separate identification and dominant recording. In this paper we evaluate the robustness of lip reading methods under the assumption that lip parameters are estimated with errors. We show that the dominant recording approach is more robust than other methods through lip reading experiments.

  • PDF

Designing a large recording script for open-domain English speech synthesis

  • Kim, Sunhee;Kim, Hojeong;Lee, Yooseop;Kim, Boryoung;Won, Yongkook;Kim, Bongwan
    • 말소리와 음성과학
    • /
    • 제13권3호
    • /
    • pp.65-70
    • /
    • 2021
  • This paper proposes a method for designing a large recording script for open domain English speech synthesis. For read-aloud style text, 12 domains and 294 sub-domains were designed using text contained in five different news media publications. For conversational style text, 4 domains and 36 sub-domains were designed using movie subtitles. The final script consists of 43,013 sentences, 27,085 read-aloud style sentences, and 15,928 conversational style sentences, consisting of 549,683 tokens and 38,356 types. The completed script is analyzed using four criteria: word coverage (type coverage and token coverage), high-frequency vocabulary coverage, phonetic coverage (diphone coverage and triphone coverage), and readability. The type coverage of our script reaches 36.86% despite its low token coverage of 2.97%. The high-frequency vocabulary coverage of the script is 73.82%, and the diphone coverage and triphone coverage of the whole script is 86.70% and 38.92%, respectively. The average readability of whole sentences is 9.03. The results of analysis show that the proposed method is effective in producing a large recording script for English speech synthesis, demonstrating good coverage in terms of unique words, high-frequency vocabulary, phonetic units, and readability.

녹음 환경의 차이에 따른 화자의 음원 특성 비교: 발성유형지수 k를 중심으로 (Comparison of Speaker's Source Characteristics in Different Recording Environments by Using Phonation Type Index k)

  • 이후동;강선미;박한상;장문수
    • 음성과학
    • /
    • 제10권3호
    • /
    • pp.213-224
    • /
    • 2003
  • Spoken sound includes not only speaker's source but the characteristics of vocal tract and speech radiation. This paper is based on the theory of Park[1], who proposes the Phonation Type Index k; a variable that shows the characteristic of speaker's source excluding those of speaker's vocal tract and speech radiation. With Park's theory, we collect data by changing recording environments and expanding experimental data, and analyze the data collected to see whether or not the PTI k shows good discriminating power as a variable for speaker recognition. In the experiment, we repeatedly record 8 sentences ten times for each of 5 males in the environment of a recording room and an office, extract PTI k for each speaker, and measure the discriminating power for each speaker by using the value of PTI k. The result shows that PTI k has the excellent discriminating power of speakers. We also confirm that, even if the recording environment is changed, PTI k shows similar results.

  • PDF

구개인두부전증 환자의 한국어 음성 코퍼스 구축 방안 연구 (Research on Construction of the Korean Speech Corpus in Patient with Velopharyngeal Insufficiency)

  • 이지은;김욱은;김광현;성명훈;권택균
    • Korean Journal of Otorhinolaryngology-Head and Neck Surgery
    • /
    • 제55권8호
    • /
    • pp.498-507
    • /
    • 2012
  • Background and Objectives We aimed to develop a Korean version of the velopharyngeal insufficiency (VPI) speech corpus system. Subjects and Method After developing a 3-channel simultaneous speech recording device capable of recording nasal/oral and normal compound speech separately, voice data were collected from VPI patients aged more than 10 years with/without the history of operation or prior speech therapy. This was compared to a control group for which VPI was simulated by using a french-3 nelaton tube inserted via both nostril through nasopharynx and pulling the soft palate anteriorly in varying degrees. The study consisted of three transcriptors: a speech therapist transcribed the voice file into text, a second transcriptor graded speech intelligibility and severity and the third tagged the types and onset times of misarticulation. The database were composed of three main tables regarding (1) speaker's demographics, (2) condition of the recording system and (3) transcripts. All of these were interfaced with the Praat voice analysis program, which enables the user to extract exact transcribed phrases for analysis. Results In the simulated VPI group, the higher the severity of VPI, the higher the nasalance score was obtained. In addition, we could verify the vocal energy that characterizes hypernasality and compensation in nasal/oral and compound sounds spoken by VPI patients as opposed to that characgerizes the normal control group. Conclusion With the Korean version of VPI speech corpus system, patients' common difficulties and speech tendencies in articulation can be objectively evaluated. Comparing these data with those of the normal voice, mispronunciation and dysarticulation of patients with VPI can be corrected.

법과학적 활용을 위한 삼성 스마트폰 음성 녹음 파일의 메타데이터 구조 및 속성 비교 분석 연구 (A comparative analysis of metadata structures and attributes of Samsung smartphone voice recording files for forensic use)

  • 안서영;유세희;김경화;홍기형
    • 말소리와 음성과학
    • /
    • 제14권3호
    • /
    • pp.103-112
    • /
    • 2022
  • 스마트폰의 대중화로 인하여 근래 범죄의 증거자료로 제출되는 녹취 파일은 대부분 스마트폰을 통하여 생산되고 있으며, 스마트폰을 기반으로 한 녹음 파일의 무결성(위변조) 여부가 수사와 재판 과정에서 주요 쟁점으로 떠오르고 있다. 가장 높은 국내 시장 점유율을 가진 삼성 스마트폰은 통화 및 음성 녹음, 그리고 편집이 가능한 자체 음성녹음 편집 어플리케이션이 탑재되어 유통되고 있으며, 자체 어플리케이션을 통한 편집은 외부 어플리케이션을 통한 편집과 다르게 원본 파일과의 유사성이 높기에, 무결성을 입증하기 위해 더 정밀한 분석 기법 개발이 필요하다. 본 연구에서는 삼성 스마트폰 34개 기종에서 생성된 원본 녹음 파일과 자체 제공 음성녹음 편집 어플리케이션을 통한 편집 파일의 메타데이터 구조와 속성을 분석하여, 원본과 편집본 사이의 음성 파일 메타데이터 구조 및 속성 값에서 유의미한 차이가 있음을 확인하였다.

음성인식에서 입술 파라미터 열화에 따른 견인성 연구 (Robustness of Bimodal Speech Recognition on Degradation of Lip Parameter Estimation Performance)

  • 김진영;신도성;최승호
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2002년도 11월 학술대회지
    • /
    • pp.205-208
    • /
    • 2002
  • Bimodal speech recognition based on lip reading has been studied as a representative method of speech recognition under noisy environments. There are three integration methods of speech and lip modalities as like direct identification, separate identification and dominant recording. In this paper we evaluate the robustness of lip reading methods under the assumption that lip parameters are estimated with errors. We show that the dominant recording approach is more robust than other methods with lip reading experiments. Also, a measure of lip parameter degradation is proposed. This measure can be used in the determination of weighting values of video information.

  • PDF

스마트폰과 음성합성을 활용한 영어 말하기 과제의 개발과 평가 (Development and Evaluation of an English Speaking Task Using Smartphone and Text-to-Speech)

  • 문도식
    • 한국인터넷방송통신학회논문지
    • /
    • 제16권5호
    • /
    • pp.13-20
    • /
    • 2016
  • 본 연구는 국내 영어 학습자들의 말하기 출력 활동을 활성화시킬 방안으로서 스마트폰과 음성합성(Text-to-Speech)이 지닌 장점을 활용하여 모바일 영어 학습의 한 형태인 영어 말하기 영상 과제 학습 모델을 개발, 적용한 후, 이 모델이 학습자들에게 미친 효과를 탐구하였다. 설문 조사 결과 모바일 기기를 활용한 영어 말하기 영상 과제는 학습자들의 전반적인 영어실력 뿐 아니라 발음, 말하기, 듣기, 쓰기 영역에서 자신감과 실력 향상에 긍정적인 영향을 미친 것으로 나타났다. 이런 결과를 기반으로 외국어로서 영어를 학습해야 하는 상황적 한계로 인해 충분한 영어 입력에 노출되지 못하고 말하기 출력활동이 부족한 국내 영어 학습자들의 영어 말하기 능력을 향상시킬 한 가지 방안으로써 말하기 영상 과제의 가능성과 한계를 논의한다.

한국어 발음 교육을 위한 음성 DB 구축 방안 (Designing of Speech DB for Korean Pronunciation Education)

  • 정명숙
    • 대한음성학회지:말소리
    • /
    • 제47호
    • /
    • pp.51-72
    • /
    • 2003
  • The purpose of this paper is to design Speech Database for Korean pronunciation education. For this purpose, I investigated types of speech errors of Korean-learners, made texts for recording, which involves all types of speech errors, and showed how to gather speech data and how to tag their informations. It's natural that speech data should include Korean-learners' speech and Korean people's speech, because Speech DB that I try to develop is for teaching Korean pronunciation to foreigners. So this DB should have informations about speakers and phonetic informations, which are about phonetic value of segments and intonation of sentences. The intonation of sentence varies with the type of sentence, the structure of prosodic units, the length of a prosodic unit and so on. For this reason, Speech DB must involve tags about these informations.

  • PDF

러시아어 발화시 억양의 역할 (On the Role of the Phatic Function of Intonation in Russian)

  • 박근우
    • 음성과학
    • /
    • 제4권1호
    • /
    • pp.81-89
    • /
    • 1998
  • This paper investigates the phatic function of intonation in Russian by recording and analysing 11 female native speakers of standard Moscow Russian. This paper shows that differences in intonation pattern of a sentence are associated with differences in degree of listener's involvement in the speech. Intonation pattern of an utterance having phatic function appears to be determined by 1) the speaker's readiness to talk to evoke the listener's attention ; 2) the speaker's intention to continue the communication. Some emphasis is placed on the relationship between intonation pattern of an utterance and speaker-listener interaction.

  • PDF

한국어 공통 음성 DB구축 및 오류 검증 (Common Speech Database Collection and Validation for Communications)

  • 이수종;김상훈;이영직
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.145-157
    • /
    • 2003
  • In this paper, we'd like to briefly introduce Korean common speech database, which project has been started to construct a large scaled speech database since 2002. The project aims at supporting the R&D environment of the speech technology for industries. It encourages domestic speech industries and activates speech technology domestic market. In the first year, the resulting common speech database consists of 25 kinds of databases considering various recording conditions such as telephone, PC, VoIP etc. The speech database will be widely used for speech recognition, speech synthesis, and speaker identification. On the other hand, although the database was originally corrected by manual, still it retains unknown errors and human errors. So, in order to minimize the errors in the database, we tried to find the errors based on the recognition errors and classify several kinds of errors. To be more effective than typical recognition technique, we will develop the automatic error detection method. In the future, we will try to construct new databases reflecting the needs of companies and universities.

  • PDF