• 제목/요약/키워드: Human speech

검색결과 569건 처리시간 0.028초

LP 방법에 의한 한국모음의 분석과 합성 (Analysis and synthesis of Korean Vowels by LP Method)

  • 손호인;신동진;안수길
    • 대한전자공학회논문지
    • /
    • 제18권1호
    • /
    • pp.41-50
    • /
    • 1981
  • The human speech contains many redundancies. To economize communication channel or memory size for a computerized synthesis of human voices, it is necessary to compress the data before sending. We have treated human speech organ as an eighth order dynamic system which is time varying as the person speaks. Using an anaylyzer of our design, each eight parameters are obtained for the vowels [아], [어], [오], [우], [으], [이], [애], and (외) of korean language with considerable discrepancies between persons. Supplying those parameters to a synthesizer which we have made, we have sucoeeded in the simulation of human speech for the above mentioned vowels of Korean language and observed that they bear all the features of the original speakers.

  • PDF

Acoustic-Phonetic Phenotypes in Pediatric Speech Disorders;An Interdisciplinary Approach

  • Bunnell, H. Timothy
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2006년도 추계학술대회 발표논문집
    • /
    • pp.31-36
    • /
    • 2006
  • Research in the Center for Pediatric Auditory and Speech Sciences (CPASS) is attempting to characterize or phenotype children with speech delays based on acoustic-phonetic evidence and relate those phenotypes to chromosome loci believed to be related to language and speech. To achieve this goal we have adopted a highly interdisciplinary approach that merges fields as diverse as automatic speech recognition, human genetics, neuroscience, epidemiology, and speech-language pathology. In this presentation I will trace the background of this project, and the rationale for our approach. Analyses based on a large amount of speech recorded from 18 children with speech delays will be presented to illustrate the approach we will be taking to characterize the acoustic phonetic properties of disordered speech in young children. The ultimate goal of our work is to develop non-invasive and objective measures of speech development that can be used to better identify which children with apparent speech delays are most in need of, or would receive the most benefit from the delivery of therapeutic services.

  • PDF

On the Role of Prefabricated Speech in L2 Acquisition Process: An Information Processing Approach

  • 부경순
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 1991년도 제3회 한글 및 한국어정보처리 학술대회
    • /
    • pp.196-208
    • /
    • 1991
  • This study focused on the role of prefabricated speech (routines and patterns) in the L2 acquisition process. The data for this study consisted of spontaneous speech samples and various observational records of three Korean children learning English as L2 in a nursery school. The specific questions addressed here were: (1) What routines, patterns, and creative constructions did the children use? (2) What was the general trend in the three children's use of routines, patterns, and creative constructions over time? The data were collected over a period of one school year by observing the children in their school. The findings were discussed from the perspective of human information processing. This study found that prefabricated speech played a significant role in the three children's L2 acquisition. The automatic processing of prefabricated speech appeared to enable the children to reduce the burden on their information processing systems, which allowed the saved resources available for other language development activities. Also, the children's language development was evident in their increase in the use of patterns. The children were moving from heavy dependence on wholly unanalyzed routines to increased use of partly unanalyzed patterns. This increased control was the result of an increase in procedural knowledge.

  • PDF

한국어 공통 음성 DB구축 및 오류 검증 (Common Speech Database Collection and Validation for Communications)

  • 이수종;김상훈;이영직
    • 대한음성학회지:말소리
    • /
    • 제46호
    • /
    • pp.145-157
    • /
    • 2003
  • In this paper, we'd like to briefly introduce Korean common speech database, which project has been started to construct a large scaled speech database since 2002. The project aims at supporting the R&D environment of the speech technology for industries. It encourages domestic speech industries and activates speech technology domestic market. In the first year, the resulting common speech database consists of 25 kinds of databases considering various recording conditions such as telephone, PC, VoIP etc. The speech database will be widely used for speech recognition, speech synthesis, and speaker identification. On the other hand, although the database was originally corrected by manual, still it retains unknown errors and human errors. So, in order to minimize the errors in the database, we tried to find the errors based on the recognition errors and classify several kinds of errors. To be more effective than typical recognition technique, we will develop the automatic error detection method. In the future, we will try to construct new databases reflecting the needs of companies and universities.

  • PDF

심리 음향 기준을 이용한 새로운 음질 개선 방법 (New Speech Enhancement Method using Psychoacoustic Criteria)

  • 김대경;박장식;손경식
    • 한국멀티미디어학회논문지
    • /
    • 제4권1호
    • /
    • pp.56-66
    • /
    • 2001
  • 최근에 심리 음향 기준을 이용한 스펙트럼 차감법이 제안되었다. Virag의 알고리즘에서는 기존의 방법보다 청취자가 더 편안한 음성을 들을 수 있지만 잡음에 강인한 음성활동 검출기가 필요하다. 음성활동 검출기를 필요로 하지 않는 확장 스펙트럼 차감법에서는 신호 대 잡음비가 감소함에 따라 잔여 잡음이 더욱 잘 들리게 된다. 본 논문에서는 심리 음향 기준을 이용한 스펙트럼 차감법에 Wiener 필터를 결합한 새로운 음질 개선 방법을 제안한다. 제안한 방법에서는 Wiener 필터를 사용하여 음성 구간에 서도 잡음의 추정치가 계속 갱신되므로 음성 검출기가 필요 없고 마스킹 임계값에 따라 차감 파라미터를 조정하기 때문에 잔여 잡음이 거의 들리지 않게 된다. 제안된 방법에 대하여 시뮬레이션을 통하여 기존의 스펙트럼 차감법과 성능을 비교한 결과, 제안한 방법을 사용하여 개선된 음성이 기존의 방법에 비하여 청취하기에 더 편한 음질을 제공하였다.

  • PDF

An evaluation of Korean students' pronunciation of an English passage by a speech recognition application and two human raters

  • Yang, Byunggon
    • 말소리와 음성과학
    • /
    • 제12권4호
    • /
    • pp.19-25
    • /
    • 2020
  • This study examined thirty-one Korean students' pronunciation of an English passage using a speech recognition application, Speechnotes, and two Canadian raters' evaluations of their speech according to the International English Language Testing System (IELTS) band criteria to assess the possibility of using the application as a teaching aid for pronunciation education. The results showed that the grand average percentage of correctly recognized words was 77.7%. From the moderate recognition rate, the pronunciation level of the participants was construed as intermediate and higher. The recognition rate varied depending on the composition of the content words and the function words in each given sentence. Frequency counts of unrecognized words by group level and word type revealed the typical pronunciation problems of the participants, including fricatives and nasals. The IELTS bands chosen by the two native raters for the rainbow passage had a moderately high correlation with each other. A moderate correlation was reported between the number of correctly recognized content words and the raters' bands, while an almost a negligible correlation was found between the function words and the raters' bands. From these results, the author concludes that the speech recognition application could constitute a partial aid for diagnosing each individual's or the group's pronunciation problems, but further studies are still needed to match human raters.

A 3D Audio-Visual Animated Agent for Expressive Conversational Question Answering

  • Martin, J.C.;Jacquemin, C.;Pointal, L.;Katz, B.
    • 한국정보컨버전스학회:학술대회논문집
    • /
    • 한국정보컨버전스학회 2008년도 International conference on information convergence
    • /
    • pp.53-56
    • /
    • 2008
  • This paper reports on the ACQA(Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent(ACA) for conducting research along two main lines: 1/ perceptual experiments(eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head models at different resolutions and the integration of the talking head in virtual scenes. The target application of this expressive ACA is a real-time question and answer speech based system developed at LIMSI(RITEL). The architecture of the system is based on distributed modules exchanging messages through a network protocol. The main components of the system are: RITEL a question and answer system searching raw text, which is able to produce a text(the answer) and attitudinal information; this attitudinal information is then processed for delivering expressive tags; the text is converted into phoneme, viseme, and prosodic descriptions. Audio speech is generated by the LIMSI selection-concatenation text-to-speech engine. Visual speech is using MPEG4 keypoint-based animation, and is rendered in real-time by Virtual Choreographer (VirChor), a GPU-based 3D engine. Finally, visual and audio speech is played in a 3D audio and visual scene. The project also puts a lot of effort for realistic visual and audio 3D rendering. A new model of phoneme-dependant human radiation patterns is included in the speech synthesis system, so that the ACA can move in the virtual scene with realistic 3D visual and audio rendering.

  • PDF

말소리지각에 대한 종설: 음성공학과의 융복합을 위한 첫 단계 (A review of speech perception: The first step for convergence on speech engineering)

  • 이영림
    • 디지털융복합연구
    • /
    • 제15권12호
    • /
    • pp.509-516
    • /
    • 2017
  • 사람들은 항상 사건들과 접하고 말소리 지각과 같은 사건을 지각하는데 별 어려움이 없다. 생물학적 운동의 지각과 마찬가지로, 말소리 지각에 대한 두 이론이 논쟁해 왔다. 이 논문의 목적은 말소리 지각에 대해 설명하고 말소리 지각에 대한 운동이론과 직접지각 이론을 비교하는 것이다. 운동이론학자들은 인간은 운동신경의 명령에 의해 말소리를 지각하고 생성해 내기 때문에 인간은 말소리 지각에 있어서 특별한 감각을 가지고 있다고 주장해 왔다. 하지만, 직접지각 이론학자들은 말소리 지각은 여느 다른 소리를 지각하는 것과 다르지 않다고 제안했다. 왜냐하면, 말소리를 지각하는 것은 다른 모든 사건을 지각하는 것과 마찬가지로 필요한 정보를 직접 탐지하면 되기 때문이다. 음성공학과의 융합에 있어서 이러한 인간의 기본적인 말소리 지각 능력을 먼저 이해하는 것이 중요하다. 따라서 이러한 말소리 지각에 대한 기본적인 이해는 인공 지능, 음성 인식 기술, 음성 인식 시스템 등에 사용될 수 있을 것으로 기대된다.