• Title/Summary/Keyword: 음성분석 및 변환

Search Result 65, Processing Time 0.029 seconds

Text Preprocessor for Generating Korean Automatic Pronunciation Variants Using Morpheme-trg Information (한국어 발음열 자동 생성을 위한 형태소 태그 정보 기반의 텍스트 전처리기)

  • 이경님;정민화
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.199-201
    • /
    • 2001
  • 일반적으로 발음열 자동 생성기는 음성 인식 및 음성 합성에 사용되며, 그 주된 역할은 입력된 한글 철자에 대해 발음 나는 데로 표기된 음소열로 출력하는 것이다. 그러나 실제 입력되는 문장에는 특수 기호 및 알파벳. 아라비아 숫자, 영어 단어, 알파벳과 숫자가 혼용된 약어, 기호 단위 명사 등이 포함되어 있다. 게다가 아라비아 숫자의 경우 단위 명사의 종류에 따라서 뿐만 아니라, 문맥에 따라 숫자를 읽는 방식이 달라지게 된다. 이러한 모든 현상들을 발음열 생성기 내부에서 처리하게 되면 선행작업이 상대적으로 크게 되어 과부하 문제 가 발생된다. 또한 어절 내의 문맥 정보만으로 정확한 변환 결과를 얻기 힘들기 때문에 형태소 분석 수행 결과 및 예외처리를 위 한 루틴을 포함하여 한글 자소 단위의 입력형식으로 변환하는 전처리 시스템을 구성하였다.

  • PDF

An Emotion Recognition Method using Facial Expression and Speech Signal (얼굴표정과 음성을 이용한 감정인식)

  • 고현주;이대종;전명근
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.6
    • /
    • pp.799-807
    • /
    • 2004
  • In this paper, we deal with an emotion recognition method using facial images and speech signal. Six basic human emotions including happiness, sadness, anger, surprise, fear and dislike are investigated. Emotion recognition using the facial expression is performed by using a multi-resolution analysis based on the discrete wavelet transform. And then, the feature vectors are extracted from the linear discriminant analysis method. On the other hand, the emotion recognition from speech signal method has a structure of performing the recognition algorithm independently for each wavelet subband and then the final recognition is obtained from a multi-decision making scheme.

Design and Implementation of Korean Voice Web Browser (한국어 음성 웹브라우저 설계 및 구현)

  • Jang, Young-Gun;Jo, Kyoung-Hwan
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.7 no.5
    • /
    • pp.458-466
    • /
    • 2001
  • This paper is addressed to a design and implementation of Korean voice web browser using voice technologies for controling web browser and selecting contents in the web document, and converting them to voice after HTML analysis. Main feature of this web browser is universal design which considers both of normal person and visual disabled, allows multi-modal interface. As voice interface for visual disabled, it supports tree structure which allows to recognize web document structure easily by only voice guidance regardless of frame usage, can handle all elements described as tag in the web document, identify them as predefined different voice property according to element property. This method gets rid of additional guidance voice for element property without audio style sheet or additional programming effort.

  • PDF

A Study on the Development of Automatic Schedule Management System through Speech Recognition Text Analysis (음성인식 텍스트 분석을 통한 자동 일정 관리 시스템 개발에 관한 연구)

  • Lee, Hae-Mi;Cho, We-Duke
    • Annual Conference of KIPS
    • /
    • 2022.05a
    • /
    • pp.279-282
    • /
    • 2022
  • 컴퓨터가 마이크 등의 소리 센서를 통해 얻은 음향학적 신호를 단어나 문장으로 변환시키는 기술인 음성 인식 기술과 인공지능 기술을 결합한 음성 대화 시스템에 대한 연구 진행 및 제품 출시가 활발하게 이루어지고 있다. 기존의 시스템을 사용하면서 날짜와 시간 외의 정보 추출 정도가 빈약하거나 자동 등록이 되지 않는 문제점을 확인하였다. 음성 인식 기술을 통해 얻은 텍스트에서 보다 많은 정보를 추출하고, 자동 등록 및 알림과 맛집 등 추가 정보 제공 시스템을 구축하는 것을 목표로 하였다.

A Study on the Natural Language Generation by Machine Translation (영한 기계번역의 자연어 생성 연구)

  • Hong Sung-Ryong
    • Journal of Digital Contents Society
    • /
    • v.6 no.1
    • /
    • pp.89-94
    • /
    • 2005
  • In machine translation the goal of natural language generation is to produce an target sentence transmitting the meaning of source sentence by using an parsing tree of source sentence and target expressions. It provides generator with linguistic structures, word mapping, part-of-speech, lexical information. The purpose of this study is to research the Korean Characteristics which could be used for the establishment of an algorism in speech recognition and composite sound. This is a part of realization for the plan of automatic machine translation. The stage of MT is divided into the level of morphemic, semantic analysis and syntactic construction.

  • PDF

Normalization of Spectral Magnitude and Cepstral Transformation for Compensation of Lombard Effect (롬바드 효과의 보정을 위한 스펙트럼 크기의 정규화와 켑스트럼 변환)

  • Chi, Sang-Mun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.83-92
    • /
    • 1996
  • This paper describes Lombard effect compensation and noise suppression so as to reduce speech recognition error in noisy environments. Lombard effect is represented by the variation of spectral envelope of energy normalized word and the variation of overall vocal intensity. The variation of spectral envelope can be compensated by linear transformation in cepstral domain. The variation of vocal intensity is canceled by spectral magnitude normalization. Spectral subtraction is use to suppress noise contamination, and band-pass filtering is used to emphasize dynamic features. To understand Lombard effect and verify the effectiveness of the proposed method, speech data are collected in simulated noisy environments. Recognition experiments were conducted with contamination by noise from automobile cabins, an exhibition hall, telephone booths in down town, crowded streets, and computer rooms. From the experiments, the effectiveness of the proposed method has been confirmed.

  • PDF

Time Expression Analysis For Reminder Applications Using Speech Recognition (음성인식 기반 리마인더를 위한 시간 표현 분석 기법)

  • Park, Jaeseong;Lee, Sangwon;Jang, Jaena;Kang, Sangwoo
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.264-266
    • /
    • 2017
  • 본 연구는 리마인더 앱을 위한 효과적인 시간 표현 분석 방법을 제안한다. 시간 표현 분석을 위한 정규식 패턴을 이용하여 사용자 발화 텍스트로부터 시간 정보를 분석하고 시간 표현 유형에 따라 절대적 시간 정보로 변환한다. 제안한 방법은 정규식 패턴을 이용한 시간 표현 분석 기법으로 시스템의 유지 관리가 용이하고 정보량이 많은 패턴과의 매칭을 위해 효과적이다.

  • PDF

Design and Implementation of Simple Text-to-Speech System using Phoneme Units (음소단위를 이용한 소규모 문자-음성 변환 시스템의 설계 및 구현)

  • Park, Ae-Hee;Yang, Jin-Woo;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.3
    • /
    • pp.49-60
    • /
    • 1995
  • This paper is a study on the design and implementation of the Korean Text-to-Speech system which is used for a small and simple system. In this paper, a parameter synthesis method is chosen for speech syntheiss method, we use PARCOR(PARtial autoCORrelation) coefficient which is one of the LPC analysis. And we use phoneme for synthesis unit which is the basic unit for speech synthesis. We use PARCOR, pitch, amplitude as synthesis parameter of voice, we use residual signal, PARCOR coefficients as synthesis parameter of unvoice. In this paper, we could obtain the 60% intelligibility by using the residual signal as excitation signal of unvoiced sound. The result of synthesis experiment, synthesis of a word unit is available. The controlling of phoneme duration is necessary for synthesizing of a sentence unit. For setting up the synthesis system, PC 486, a 70[Hz]-4.5[KHz] band pass filter for speech input/output, amplifier, and TMS320C30 DSP board was used.

  • PDF

Cyber Threats Analysis of AI Voice Recognition-based Services with Automatic Speaker Verification (화자식별 기반의 AI 음성인식 서비스에 대한 사이버 위협 분석)

  • Hong, Chunho;Cho, Youngho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.33-40
    • /
    • 2021
  • Automatic Speech Recognition(ASR) is a technology that analyzes human speech sound into speech signals and then automatically converts them into character strings that can be understandable by human. Speech recognition technology has evolved from the basic level of recognizing a single word to the advanced level of recognizing sentences consisting of multiple words. In real-time voice conversation, the high recognition rate improves the convenience of natural information delivery and expands the scope of voice-based applications. On the other hand, with the active application of speech recognition technology, concerns about related cyber attacks and threats are also increasing. According to the existing studies, researches on the technology development itself, such as the design of the Automatic Speaker Verification(ASV) technique and improvement of accuracy, are being actively conducted. However, there are not many analysis studies of attacks and threats in depth and variety. In this study, we propose a cyber attack model that bypasses voice authentication by simply manipulating voice frequency and voice speed for AI voice recognition service equipped with automated identification technology and analyze cyber threats by conducting extensive experiments on the automated identification system of commercial smartphones. Through this, we intend to inform the seriousness of the related cyber threats and raise interests in research on effective countermeasures.

Speech syntheis engine for TTS (TTS 적용을 위한 음성합성엔진)

  • 이희만;김지영
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.23 no.6
    • /
    • pp.1443-1453
    • /
    • 1998
  • This paper presents the speech synthesis engine that converts the character strings kept in a computer memory into the synthesized speech sounds with enhancing the intelligibility and the naturalness by adapting the waveform processing method. The speech engine using demisyllable speech segments receives command streams for pitch modification, duration and energy control. The command based engine isolates the high level processing of text normalization, letter-to-sound and the lexical analysis and the low level processing of signal filtering and pitch processing. The TTS(Text-to-Speech) system implemented by using the speech synthesis engine has three independent object modules of the Text-Normalizer, the Commander and the said Speech Synthesis Engine those of which are easily replaced by other compatible modules. The architecture separating the high level and the low level processing has the advantage of the expandibility and the portability because of the mix-and-match nature.

  • PDF